Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MiniLingua: A Small Open-Source LLM for European Languages

About

Large language models are powerful but often limited by high computational cost, privacy concerns, and English-centric training. Recent progress demonstrates that small, efficient models with around one billion parameters can deliver strong results and enable on-device use. This paper introduces MiniLingua, a multilingual open-source LLM of one billion parameters trained from scratch for 13 European languages, designed to balance coverage and instruction-following capabilities. Based on evaluation results, the instruction-tuned version of MiniLingua outperforms EuroLLM, a model with a similar training approach but a larger training budget, on summarization, classification and both open- and closed-book question answering. Moreover, it remains competitive with more advanced state-of-the-art models on open-ended generation tasks. We release model weights, tokenizer and source code used for data processing and model training.

Anna Aksenova, Boris Zverkov, Nicola Dainese, Alexander Nikitin, Pekka Marttinen• 2025

Related benchmarks

TaskDatasetResultRank
Reading ComprehensionBelebele
Accuracy26.2
39
Machine TranslationFlores-200
COMET0.343
23
Machine TranslationFlores-200 (test)--
22
Question AnsweringMMLU-X
Accuracy24.5
12
Topic ClassificationSIB200
Accuracy24.8
11
Text SummarizationMassiveSum
Score18.7
4
Showing 6 of 6 rows

Other info

Follow for update