Controlling Language Confusion in Multilingual LLMs

About

Large language models often suffer from language confusion, a phenomenon in which responses are partially or entirely generated in unintended languages. This critically degrades the user experience, especially in low-resource settings. We hypothesize that this issue stems from limitations in conventional fine-tuning objectives, such as supervised learning, which optimize the likelihood of correct tokens without explicitly penalizing undesired outputs such as cross-lingual mixing. Analysis of loss trajectories during pretraining further reveals that models fail to distinguish between monolingual and language-mixed texts, highlighting the absence of inherent pressure to avoid such confusion. In this work, we apply ORPO, which adds penalties for unwanted output styles to standard SFT, effectively suppressing language-confused generations. ORPO maintains strong language consistency, even under high decoding temperatures, while preserving general QA performance. Our findings suggest that incorporating appropriate penalty terms can effectively mitigate language confusion in multilingual models, particularly in low-resource scenarios.

Nahyun Lee, Yeongseo Woo, Hyunwoo Ko, Guijin Son• 2025

Related benchmarks

Task	Dataset	Result
Reasoning	BBH	Accuracy49.22	770
Multitask Language Understanding	MMLU	Accuracy51.58	568
Graduate-level Question Answering	GPQA	Accuracy30.52	224
Math Word Problem Solving	GSM8K	Accuracy78.09	158
Mathematical Problem Solving	MATH	Accuracy43.78	114
Instruction Following	MIF en	Accuracy66	10
Instruction Following	MIF (target)	Accuracy43.82	10
Multitask Language Understanding	MMMLU (target)	RPR57.86	5
Science Q&A	ARC-C en	Accuracy82.99	5
Science Question Answering	GPQA EN	Accuracy31.52	5

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord