Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

EuroLLM: Multilingual Language Models for Europe

About

The quality of open-weight LLMs has seen significant improvement, yet they remain predominantly focused on English. In this paper, we introduce the EuroLLM project, aimed at developing a suite of open-weight multilingual LLMs capable of understanding and generating text in all official European Union languages, as well as several additional relevant languages. We outline the progress made to date, detailing our data collection and filtering process, the development of scaling laws, the creation of our multilingual tokenizer, and the data mix and modeling configurations. Additionally, we release our initial models: EuroLLM-1.7B and EuroLLM-1.7B-Instruct and report their performance on multilingual general benchmarks and machine translation.

Pedro Henrique Martins, Patrick Fernandes, Jo\~ao Alves, Nuno M. Guerreiro, Ricardo Rei, Duarte M. Alves, Jos\'e Pombal, Amin Farajian, Manuel Faysse, Mateusz Klimaszewski, Pierre Colombo, Barry Haddow, Jos\'e G. C. de Souza, Alexandra Birch, Andr\'e F. T. Martins• 2024

Related benchmarks

TaskDatasetResultRank
Multi-task Language UnderstandingMMLU
Accuracy28.3
842
Instruction FollowingIFEval
Accuracy (0-100)23.7
292
Science Question AnsweringARC Challenge
Accuracy35.9
234
Commonsense ReasoningWinoGrande
Accuracy57.8
231
Question AnsweringARC-C
Accuracy31.57
166
Common Sense ReasoningHellaSwag
Accuracy45.9
164
Science Question AnsweringARC Easy
Accuracy71.3
101
Multitask Language UnderstandingMMLU-Pro
Accuracy10.9
99
Commonsense ReasoningSocialIQA
Accuracy44.8
97
Linguistic and Cultural CompetencyPolish Linguistic and Cultural Competency Benchmark (PLCC)
Avg Score41
52
Showing 10 of 78 rows
...

Other info

Follow for update