AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese
About
Despite rapid progress in open large language models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and native evaluation, with machine-translated benchmarks likely missing the variant's linguistic and cultural nuances. We introduce AMALIA, a fully open LLM that prioritizes pt-PT by using more high-quality pt-PT data during both the mid- and post-training stages. To evaluate pt-PT more faithfully, we release a suite of pt-PT benchmarks that includes translated standard tasks and four new datasets targeting pt-PT generation, linguistic competence, and pt-PT/pt-BR bias. Experiments show that AMALIA matches strong baselines on translated benchmarks while substantially improving performance on pt-PT-specific evaluations, supporting the case for targeted training and native benchmarking for European Portuguese.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reasoning | BBH | Accuracy50.3 | 672 | |
| Instruction Following | IFEval | IFEval Accuracy61.6 | 625 | |
| Multitask Language Understanding | MMLU | Accuracy58.8 | 413 | |
| Question Answering | TriviaQA | Accuracy63.5 | 238 | |
| Science Question Answering | ARC-C | Accuracy78.9 | 193 | |
| Safety Evaluation | AdvBench | -- | 117 | |
| Social Commonsense Reasoning | SIQA | Accuracy46.3 | 89 | |
| Physical Commonsense Reasoning | PIQA | Accuracy72.5 | 45 | |
| Language Generation | P3B3 | General Score95.9 | 14 | |
| Portuguese Educational Proficiency | PT-C | Accuracy71.4 | 14 |