Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Cross-lingual Self-Consistency for Multilingual Reasoning with Language Models

About

Despite expanding their multilingual coverage, the advanced reasoning capabilities of LLMs remain largely confined to a few high-resource languages like English. To address this, we propose an unsupervised Reinforcement Learning (RL) approach to enhance multilingual reasoning by enforcing cross-lingual self-consistency: the principle that a model should produce the same final answer for equivalent problems in different languages. Existing methods are limited by the scarcity of multilingual reasoning data and show weak generalization to unseen languages. Our approach requires neither gold answers nor parallel data, and it achieves average gains of up to 21.7% on MGSM across 10 languages. In addition, our method demonstrates strong generalization, with an 18.2% mean improvement on MGSM languages unseen during training, and up to 6.2% gain on 3 out-of-distribution benchmarks. These results show the potential of consistency-based methods to improve the multilingual capabilities of LLMs without requiring supervised data.

Ahmed Elhady, Eneko Agirre, Mikel Artetxe• 2026

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMGSM (test)
Accuracy (ZH)88
80
Mathematical ReasoningMGSM
Accuracy (Bn)64.4
49
Mathematical ReasoningMMATH
Accuracy78.4
36
Mathematical ReasoningPolyMath
Accuracy20.9
12
Multilingual Question AnsweringmGPQA
Accuracy32.9
12
Showing 5 of 5 rows

Other info

Follow for update