MindMerger: Efficient Boosting LLM Reasoning in non-English Languages

About

Reasoning capabilities are crucial for Large Language Models (LLMs), yet a notable gap exists between English and non-English languages. To bridge this disparity, some works fine-tune LLMs to relearn reasoning capabilities in non-English languages, while others replace non-English inputs with an external model's outputs such as English translation text to circumvent the challenge of LLM understanding non-English. Unfortunately, these methods often underutilize the built-in skilled reasoning and useful language understanding capabilities of LLMs. In order to better utilize the minds of reasoning and language understanding in LLMs, we propose a new method, namely MindMerger, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance. Furthermore, a two-step training scheme is introduced to first train to embeded the external capabilities into LLMs and then train the collaborative utilization of the external capabilities and the built-in capabilities in LLMs. Experiments on three multilingual reasoning datasets and a language understanding dataset demonstrate that MindMerger consistently outperforms all baselines, especially in low-resource languages. Without updating the parameters of LLMs, the average accuracy improved by 6.7% and 8.0% across all languages and low-resource languages on the MGSM dataset, respectively.

Zixian Huang, Wenhao Zhu, Gong Cheng, Lei Li, Fei Yuan• 2024

Related benchmarks

Task	Dataset	Result
Natural Language Inference	XNLI (test)	Average Accuracy78.4	167
Mathematical Reasoning	MGSM (test)	Accuracy (ZH)70	80
Multilingual Mathematical Reasoning	MGSM	--	52
Mathematical Reasoning	MGSM	Accuracy (Bn)66.8	49
Machine Translation	Flores-101 (test)	Average Score3.55e+3	41
Multilingual Mathematical Reasoning	MGSM 1.0 (test)	Accuracy (ru)69.6	35
Multilingual Mathematical Reasoning	MSVAMP	Accuracy (English)67.8	33
Commonsense Reasoning	X-CSQA (test)	Average Accuracy61	20
Mathematical Reasoning	MSVAMP	Average Accuracy78.3	20
Abstractive Summarization	XL-Sum (test)	Language Democratization24.62	20

Showing 10 of 14 rows

Other info

Code

Follow for update

@wizwand_team Discord