Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese

About

Despite rapid progress in open large language models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and native evaluation, with machine-translated benchmarks likely missing the variant's linguistic and cultural nuances. We introduce AMALIA, a fully open LLM that prioritizes pt-PT by using more high-quality pt-PT data during both the mid- and post-training stages. To evaluate pt-PT more faithfully, we release a suite of pt-PT benchmarks that includes translated standard tasks and four new datasets targeting pt-PT generation, linguistic competence, and pt-PT/pt-BR bias. Experiments show that AMALIA matches strong baselines on translated benchmarks while substantially improving performance on pt-PT-specific evaluations, supporting the case for targeted training and native benchmarking for European Portuguese.

Afonso Simpl\'icio, Gon\c{c}alo Vinagre, Miguel Moura Ramos, Diogo Tavares, Rafael Ferreira, Giuseppe Attanasio, Duarte M. Alves, In\^es Calvo, In\^es Vieira, Rui Guerra, James Furtado, Beatriz Canaverde, Iago Paulo, Vasco Ramos, Diogo Gl\'oria-Silva, Miguel Faria, Marcos Treviso, Daniel Gomes, Pedro Gomes, David Semedo, Andr\'e Martins, Jo\~ao Magalh\~aes• 2026

Related benchmarks

TaskDatasetResultRank
ReasoningBBH
Accuracy50.3
672
Instruction FollowingIFEval
IFEval Accuracy61.6
625
Multitask Language UnderstandingMMLU
Accuracy58.8
413
Question AnsweringTriviaQA
Accuracy63.5
238
Science Question AnsweringARC-C
Accuracy78.9
193
Safety EvaluationAdvBench--
117
Social Commonsense ReasoningSIQA
Accuracy46.3
89
Physical Commonsense ReasoningPIQA
Accuracy72.5
45
Language GenerationP3B3
General Score95.9
14
Portuguese Educational ProficiencyPT-C
Accuracy71.4
14
Showing 10 of 16 rows

Other info

Follow for update