Salamandra Technical Report

About

This work introduces Salamandra, a suite of open-source decoder-only large language models available in three different sizes: 2, 7, and 40 billion parameters. The models were trained from scratch on highly multilingual data that comprises text in 35 European languages and code. Our carefully curated corpus is made exclusively from open-access data compiled from a wide variety of sources. Along with the base models, supplementary checkpoints that were fine-tuned on public-domain instruction data are also released for chat applications. Additionally, we also share our preliminary experiments on multimodality, which serve as proof-of-concept to showcase potential applications for the Salamandra family. Our extensive evaluations on multilingual benchmarks reveal that Salamandra has strong capabilities, achieving competitive performance when compared to similarly sized open-source models. We provide comprehensive evaluation results both on standard downstream tasks as well as key aspects related to bias and safety.With this technical report, we intend to promote open science by sharing all the details behind our design choices, data curation strategy and evaluation methodology. In addition to that, we deviate from the usual practice by making our training and evaluation scripts publicly accessible. We release all models under a permissive Apache 2.0 license in order to foster future research and facilitate commercial use, thereby contributing to the open-source ecosystem of large language models.

Aitor Gonzalez-Agirre, Marc P\`amies, Joan Llop, Irene Baucells, Severino Da Dalt, Daniel Tamayo, Jos\'e Javier Saiz, Ferran Espu\~na, Jaume Prats, Javier Aula-Blasco, Mario Mina, I\~nigo Pikabea, Adri\'an Rubio, Alexander Shvets, Anna Sall\'es, I\~naki Lacunza, Jorge Palomar, J\'ulia Falc\~ao, Luc\'ia Tormo, Luis Vasquez-Reina, Montserrat Marimon, Oriol Pareras, Valle Ruiz-Fern\'andez, Marta Villegas• 2025

Related benchmarks

Task	Dataset	Result
Instruction Following	IFEval	IFEval Accuracy24.2	836
Reasoning	BBH	Accuracy36.3	726
Multitask Language Understanding	MMLU	Accuracy47.1	520
Science Question Answering	ARC-C	Accuracy52.2	261
Question Answering	TriviaQA	Accuracy52.6	238
Safety Evaluation	AdvBench	--	117
Social Commonsense Reasoning	SIQA	Accuracy44.8	112
Physical Commonsense Reasoning	PIQA	Accuracy70.4	45
Reading Comprehension	Belebele	Accuracy76.8	39
Multiple-choice Question Answering	EXAMS	Accuracy62.7	29

Showing 10 of 37 rows

Other info

Follow for update

@wizwand_team Discord