AMALIA Technical Report: A Fully Open Source Large Language Model for European Portuguese

About

Despite rapid progress in open large language models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and native evaluation, with machine-translated benchmarks likely missing the variant's linguistic and cultural nuances. We introduce AMALIA, a fully open LLM that prioritizes pt-PT by using more high-quality pt-PT data during both the mid- and post-training stages. To evaluate pt-PT more faithfully, we release a suite of pt-PT benchmarks that includes translated standard tasks and four new datasets targeting pt-PT generation, linguistic competence, and pt-PT/pt-BR bias. Experiments show that AMALIA matches strong baselines on translated benchmarks while substantially improving performance on pt-PT-specific evaluations, supporting the case for targeted training and native benchmarking for European Portuguese.

Afonso Simpl\'icio, Gon\c{c}alo Vinagre, Miguel Moura Ramos, Diogo Tavares, Rafael Ferreira, Giuseppe Attanasio, Duarte M. Alves, In\^es Calvo, In\^es Vieira, Rui Guerra, James Furtado, Beatriz Canaverde, Iago Paulo, Vasco Ramos, Diogo Gl\'oria-Silva, Miguel Faria, Marcos Treviso, Daniel Gomes, Pedro Gomes, David Semedo, Andr\'e Martins, Jo\~ao Magalh\~aes• 2026

Related benchmarks

Task	Dataset	Result
Instruction Following	IFEval	IFEval Accuracy61.6	836
Reasoning	BBH	Accuracy50.3	726
Multitask Language Understanding	MMLU	Accuracy58.8	520
Science Question Answering	ARC-C	Accuracy78.9	261
Question Answering	TriviaQA	Accuracy63.5	238
Safety Evaluation	AdvBench	--	117
Social Commonsense Reasoning	SIQA	Accuracy46.3	112
Physical Commonsense Reasoning	PIQA	Accuracy72.5	45
Language Generation	P3B3	General Score95.9	14
Portuguese Educational Proficiency	PT-C	Accuracy71.4	14

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord