Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM

About

We present Gamayun, a 1.5B-parameter multilingual language model trained entirely from scratch on 2.5T tokens. Designed for efficiency and deployment in resource-constrained environments, Gamayun addresses the lack of research on small non-English-centric LLMs by adopting a novel two-stage pre-training strategy: balanced multilingual training for cross-lingual alignment, followed by high-quality English enrichment to transfer performance gains across languages. Our model supports 12 languages, with special focus on Russian. Despite a significantly smaller training budget than comparable models, Gamayun outperforms LLaMA3.2-1B (9T tokens) on all considered benchmarks, and surpasses Qwen2.5-1.5B (18T tokens) on a wide range of English and multilingual tasks. It matches or exceeds Qwen3 (36T tokens) on most tasks outside advanced STEM, achieving state-of-the-art results in Russian, including the MERA benchmark, among the models of comparable size (1-2B parameters).

Alexander Podolskiy, Semen Molokov, Timofey Gerasin, Maksim Titov, Alexey Rukhovich, Artem Khrapov, Kirill Morozov, Evgeny Tetin, Constantine Korikov, Pavel Efimov, Polina Lazukova, Yuliya Skripkar, Nikita Okhotnikov, Irina Piontkovskaya, Meng Xiaojun, Zou Xueyi, Zhang Zhenhe• 2025

Related benchmarks

TaskDatasetResultRank
Multi-task Language UnderstandingMMLU
Accuracy46.4
881
Instruction FollowingIFEval--
836
Commonsense ReasoningWinoGrande
Accuracy61.7
453
Science Question AnsweringARC Challenge
Accuracy42.5
354
Question AnsweringARC-C
Accuracy35.67
258
Multitask Language UnderstandingMMLU-Pro
Accuracy24.9
248
Common Sense ReasoningHellaSwag
Accuracy49.3
213
Science Question AnsweringARC Easy
Accuracy73.7
162
Commonsense ReasoningSocialIQA
Accuracy48.2
158
Reading ComprehensionC3
Accuracy75.29
89
Showing 10 of 62 rows

Other info

Follow for update