How Vocabulary Sharing Facilitates Multilingualism in LLaMA?

About

Large Language Models (LLMs), often show strong performance on English tasks, while exhibiting limitations on other languages. What is an LLM's multilingual capability when it is trained only on certain languages? The underlying mechanism remains unclear. This study endeavors to examine the multilingual capability of LLMs from the vocabulary sharing perspective by conducting an exhaustive analysis across 101 languages. Through the investigation of the performance gap before and after embedding fine-tuning, we discovered four distinct quadrants. By delving into each quadrant we provide actionable and efficient guidelines for tuning these languages. Extensive experiments reveal that existing LLMs possess multilingual capabilities that surpass our expectations, and we can significantly improve the multilingual performance of LLMs based on these attributes of each quadrant~\footnote{\url{https://github.com/CONE-MT/Vocabulary-Sharing-Facilitates-Multilingualism}.}.

Fei Yuan, Shuai Yuan, Zhiyong Wu, Lei Li• 2023

Related benchmarks

Task	Dataset	Result
Natural Language Inference	XNLI	Accuracy38.3	131
Story Reasoning	XStoryCloze	Accuracy55.9	51
Natural language generation	Flores-101	spBLEU31.4	11
Natural Language Understanding	MGSM	Accuracy6.2	11
Natural Language Understanding	PAW-X	Accuracy54.6	11
Natural Language Understanding	XCOPA 1.0 (test)	Accuracy54.3	11

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord