Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

X-Instruction: Aligning Language Model in Low-resource Languages with Self-curated Cross-lingual Instructions

About

Large language models respond well in high-resource languages like English but struggle in low-resource languages. It may arise from the lack of high-quality instruction following data in these languages. Directly translating English samples into these languages can be a solution but unreliable, leading to responses with translation errors and lacking language-specific or cultural knowledge. To address this issue, we propose a novel method to construct cross-lingual instruction following samples with instruction in English and response in low-resource languages. Specifically, the language model first learns to generate appropriate English instructions according to the natural web texts in other languages as responses. The candidate cross-lingual instruction tuning samples are further refined and diversified. We have employed this method to build a large-scale cross-lingual instruction tuning dataset on 10 languages, namely X-Instruction. The instruction data built using our method incorporate more language-specific knowledge compared with the naive translation method. Experimental results have shown that the response quality of the model tuned on X-Instruction greatly exceeds the model distilled from a powerful teacher model, reaching or even surpassing the ones of ChatGPT. In addition, we find that models tuned on cross-lingual instruction following samples can follow the instruction in the output language without further tuning.

Chong Li, Wen Yang, Jiajun Zhang, Jinliang Lu, Shaonan Wang, Chengqing Zong• 2024

Related benchmarks

TaskDatasetResultRank
Natural Language InferenceXNLI--
111
Commonsense ReasoningXStoryCloze
Average Score54
32
Causal ReasoningXCOPA (test)
Accuracy (id)62
13
Instruction FollowingVicuna & WizardLM Finnish fi
Win Rate (vs ChatGPT)47
9
Instruction FollowingVicuna & WizardLM Indonesian
Win Rate (vs ChatGPT)50.3
9
Instruction FollowingVicuna & WizardLM Thai
Win Rate (vs ChatGPT)53
9
Instruction FollowingVicuna & WizardLM Turkish
Win Rate (vs ChatGPT)53.7
9
Instruction FollowingVicuna & WizardLM Vietnamese / vi
Win Rate (vs ChatGPT)57
9
Instruction FollowingVicuna & WizardLM Bengali bn
Win Rate (vs ChatGPT)68.8
9
Instruction FollowingVicuna & WizardLM Hindi
Win Rate (vs ChatGPT)65.8
9
Showing 10 of 20 rows

Other info

Code

Follow for update