Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study

About

Large language models (LLMs) have shown continuously improving multilingual capabilities, and even small-scale open-source models have demonstrated rapid performance enhancement. In this paper, we systematically explore the abilities of open LLMs with less than ten billion parameters to handle multilingual machine translation (MT) tasks. We conduct comprehensive evaluations on six popular LLMs and find that models like Gemma2-9B exhibit impressive multilingual translation capabilities. We then introduce the Parallel-First Monolingual-Second (PFMS) data mixing strategy in the continual pretraining stage to further enhance the MT performance and present GemmaX2-28, a 9B model achieving top-tier multilingual translation performance across 28 languages. Specifically, GemmaX2-28 consistently outperforms the state-of-the-art (SOTA) models such as TowerInstruct and XALMA and achieves competitive performance with Google Translate and GPT-4-turbo.

Menglong Cui, Pengzhi Gao, Wei Liu, Jian Luan, Bin Wang• 2025

Related benchmarks

TaskDatasetResultRank
Machine TranslationFLORES+ (test)
spBLEU45.09
128
Machine TranslationWMT24++ v1.0 (test)
XCOMET Score80.65
49
TranslationFLORES-200 en-it (devtest)
sacreBLEU33.831
35
TranslationFLORES-200 it-en (devtest)
sacreBLEU37.0319
35
Machine Translation (xx -> zh)FLORES+ latest (test)
spBLEU33.77
30
Machine TranslationFLORES-200 ZH ⇔ XX 2022
XCOMET-XXL0.7142
17
Machine TranslationFLORES-200 EN ⇔ XX 2022
XCOMET-XXL82.08
17
Machine TranslationFLORES-200 XX ⇔ XX 2022
XCOMET-XXL63.76
17
Machine TranslationWMT 2025 (test)
XCOMET-XXL26.79
17
Machine TranslationMandarin ⇔ Minority (test)
XCOMET-XXL0.3596
16
Showing 10 of 10 rows

Other info

Follow for update