Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM

About

We present Gamayun, a 1.5B-parameter multilingual language model trained entirely from scratch on 2.5T tokens. Designed for efficiency and deployment in resource-constrained environments, Gamayun addresses the lack of research on small non-English-centric LLMs by adopting a novel two-stage pre-training strategy: balanced multilingual training for cross-lingual alignment, followed by high-quality English enrichment to transfer performance gains across languages. Our model supports 12 languages, with special focus on Russian. Despite a significantly smaller training budget than comparable models, Gamayun outperforms LLaMA3.2-1B (9T tokens) on all considered benchmarks, and surpasses Qwen2.5-1.5B (18T tokens) on a wide range of English and multilingual tasks. It matches or exceeds Qwen3 (36T tokens) on most tasks outside advanced STEM, achieving state-of-the-art results in Russian, including the MERA benchmark, among the models of comparable size (1-2B parameters).

Alexander Podolskiy, Semen Molokov, Timofey Gerasin, Maksim Titov, Alexey Rukhovich, Artem Khrapov, Kirill Morozov, Evgeny Tetin, Constantine Korikov, Pavel Efimov, Polina Lazukova, Yuliya Skripkar, Nikita Okhotnikov, Irina Piontkovskaya, Meng Xiaojun, Zou Xueyi, Zhang Zhenhe• 2025

Related benchmarks

Task	Dataset	Result
Multi-task Language Understanding	MMLU	Accuracy46.4	881
Instruction Following	IFEval	--	836
Commonsense Reasoning	WinoGrande	Accuracy61.7	453
Science Question Answering	ARC Challenge	Accuracy42.5	354
Question Answering	ARC-C	Accuracy35.67	258
Multitask Language Understanding	MMLU-Pro	Accuracy24.9	248
Common Sense Reasoning	HellaSwag	Accuracy49.3	213
Science Question Answering	ARC Easy	Accuracy73.7	162
Commonsense Reasoning	SocialIQA	Accuracy48.2	158
Reading Comprehension	C3	Accuracy75.29	89

Showing 10 of 62 rows

Other info

Follow for update

@wizwand_team Discord