Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Building a Strong Instruction Language Model for a Less-Resourced Language

About

Large language models (LLMs) have become an essential tool for natural language processing and artificial intelligence in general. Current open-source models are primarily trained on English texts, resulting in poorer performance on less-resourced languages and cultures. We present a set of methodological approaches necessary for the successful adaptation of an LLM to a less-resourced language, and demonstrate them using the Slovene language. We present GaMS3-12B, a generative model for Slovene with 12 billion parameters, and demonstrate that it is the best-performing open-source model for Slovene within its parameter range. We adapted the model to the Slovene language using three-stage continual pre-training of the Gemma 3 model, followed by two-stage supervised fine-tuning (SFT). We trained the model on a combination of 140B Slovene, English, Bosnian, Serbian, and Croatian pretraining tokens, and over 200 thousand English and Slovene SFT examples. We evaluate GaMS3-12B on the Slovenian-LLM-Eval datasets, English-to-Slovene translation, and the Slovene LLM arena. We show that the described model outperforms 12B Gemma 3 across all three scenarios and performs comparably to much larger commercial GPT-4o in the Slovene LLM arena, achieving a win rate of over 60 %.

Domen Vre\v{s}, Tja\v{s}a Ar\v{c}on, Timotej Petri\v{c}, Dario Vajda, Marko Robnik-\v{S}ikonja, Iztok Lebar Bajec• 2026

Related benchmarks

TaskDatasetResultRank
General Chat PerformanceSlovene-LLM-Arena 2026-01-13 (leaderboard)
ELO Score1.03e+3
15
Large Language Model EvaluationSlovene-LLM-Eval
Average Rank4.25
10
English to Slovene translationWikipedia
COMET Score74.1844
8
English to Slovene translationNemotron-Chat
COMET Score0.6745
8
English to Slovene translationEnglish-to-Slovene (Overall)
Overall COMET Score0.7002
8
English to Slovene translationCC News
COMET Score68.4848
8
Showing 6 of 6 rows

Other info

Follow for update