Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM

About

We present Gamayun, a 1.5B-parameter multilingual language model trained entirely from scratch on 2.5T tokens. Designed for efficiency and deployment in resource-constrained environments, Gamayun addresses the lack of research on small non-English-centric LLMs by adopting a novel two-stage pre-training strategy: balanced multilingual training for cross-lingual alignment, followed by high-quality English enrichment to transfer performance gains across languages. Our model supports 12 languages, with special focus on Russian. Despite a significantly smaller training budget than comparable models, Gamayun outperforms LLaMA3.2-1B (9T tokens) on all considered benchmarks, and surpasses Qwen2.5-1.5B (18T tokens) on a wide range of English and multilingual tasks. It matches or exceeds Qwen3 (36T tokens) on most tasks outside advanced STEM, achieving state-of-the-art results in Russian, including the MERA benchmark, among the models of comparable size (1-2B parameters).

Alexander Podolskiy, Semen Molokov, Timofey Gerasin, Maksim Titov, Alexey Rukhovich, Artem Khrapov, Kirill Morozov, Evgeny Tetin, Constantine Korikov, Pavel Efimov, Polina Lazukova, Yuliya Skripkar, Nikita Okhotnikov, Irina Piontkovskaya, Meng Xiaojun, Zou Xueyi, Zhang Zhenhe• 2025

Related benchmarks

TaskDatasetResultRank
Multi-task Language UnderstandingMMLU
Accuracy46.4
842
Instruction FollowingIFEval
Accuracy (0-100)56.2
292
Science Question AnsweringARC Challenge
Accuracy42.5
234
Commonsense ReasoningWinoGrande
Accuracy61.7
231
Question AnsweringARC-C
Accuracy35.67
166
Common Sense ReasoningHellaSwag
Accuracy49.3
164
Science Question AnsweringARC Easy
Accuracy73.7
101
Multitask Language UnderstandingMMLU-Pro
Accuracy24.9
99
Commonsense ReasoningSocialIQA
Accuracy48.2
97
Reading ComprehensionC3
Accuracy75.29
56
Showing 10 of 62 rows

Other info

Follow for update