Tucano 2 Cool: Better Open Source LLMs for Portuguese

About

We present Tucano 2, a fully open suite of large language models (LLMs) with 0.5-3.7 billion parameters, designed to address certain gaps in open-source development for Portuguese LLMs. Following our previous works, we now extend our dataset, GigaVerbo-v2, to a new degree of quality and scale, while also introducing a new synthetic dataset, GigaVerbo-v2 Synth, aimed at filling missing gaps in GigaVerbo-v2, and two post-training datasets, GigaVerbo-v2 SFT and GigaVerbo-v2 Preferences, that allow Portuguese LLMs to be trained in domains like retrieval augmented generation, coding, tool use, chain-of-thought reasoning, and many other domains of interest. Through extensive ablation studies, we design both pretraining and continual pretraining recipes for the Tucano 2 suite (Base, Instruct, and Think), which achieve state-of-the-art performance on several Portuguese-language modeling benchmarks. We also extend and refine the evaluation harness introduced in our earlier work, yielding a comprehensive evaluation suite that provides strong signals across different pretraining, continual pretraining, and post-training regimes. All artifacts associated with Tucano 2 are openly released, including training recipes, logs, and source code, ensuring that our work is reproducible, accessible, and extendable by the broader Portuguese NLP community.

Nicholas Kluge Corr\^ea, Aniket Sen, Shiza Fatimah, Sophia Falk, Lennard Landgraf, Julia Kastner, Lucie Flek• 2026

Related benchmarks

Task	Dataset	Result
Language Modeling	Portuguese evaluation suite (test)	NPM20.63	27
Language Modeling	Portuguese Evaluation Suite Hard Set	NPM0.99	15
Language Modeling	Portuguese Evaluation Suite Total	NPM20.64	15
Language Modeling	Portuguese Evaluation Suite Easy Set	NPM39.93	15
General Language Capability	Aggregate K&R, IFEval-PT, HumanEval	Average Score53.64	14
Knowledge & Reasoning	ARC-Challenge, ENEM, BLUEX, OAB Exams, BELEBELE, MMLU, GSM8K-PT	K&R Score (NPM)56.22	14
Coding	HumanEval	Coding Score47.56	14
Instruction Following	IFEval-PT	Instruction Score41.67	14
Long-context reasoning and retrieval	RULER-PT (aggregate)	RULER-PT (Aggregate) Score @ 1024 Context81.7	9
Natural Language Understanding	Portuguese Benchmarks Easy Set	NPM40.28	8

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord