Cross-lingual Self-Consistency for Multilingual Reasoning with Language Models

About

Despite expanding their multilingual coverage, the advanced reasoning capabilities of LLMs remain largely confined to a few high-resource languages like English. To address this, we propose an unsupervised Reinforcement Learning (RL) approach to enhance multilingual reasoning by enforcing cross-lingual self-consistency: the principle that a model should produce the same final answer for equivalent problems in different languages. Existing methods are limited by the scarcity of multilingual reasoning data and show weak generalization to unseen languages. Our approach requires neither gold answers nor parallel data, and it achieves average gains of up to 21.7% on MGSM across 10 languages. In addition, our method demonstrates strong generalization, with an 18.2% mean improvement on MGSM languages unseen during training, and up to 6.2% gain on 3 out-of-distribution benchmarks. These results show the potential of consistency-based methods to improve the multilingual capabilities of LLMs without requiring supervised data.

Ahmed Elhady, Eneko Agirre, Mikel Artetxe• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MGSM (test)	Accuracy (ZH)88	80
Mathematical Reasoning	MGSM	Accuracy (Bn)64.4	66
Mathematical Reasoning	MMATH	Accuracy78.4	36
Mathematical Reasoning	PolyMath	Accuracy20.9	12
Multilingual Question Answering	mGPQA	Accuracy32.9	12

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord