Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models

About

There are two shortages in the current Large Language Models (LLMs) era. The first is short of multilingual models, where most LLMs are English-centric and performance is limited on multilingual reasoning. The second is the place of external knowledge to be used, where most retrieved knowledge is prepended to the user queries (maybe sub-optimal). This paper presents a novel and simple yet effective method called \textbf{D}ictionary \textbf{I}nsertion \textbf{P}rompting (\textbf{DIP}). When providing a non-English prompt, DIP looks up a word dictionary and inserts words' English counterparts into the middle of the prompt for LLMs. It then enables better translation into English and better English model thinking steps which leads to obviously better results. We experiment with 10 to 200 languages from FLORES-200.\footnote{The number of languages varies on the datasets, and we experiment with 200 languages on GSM8K as in Appendix} Since there are no adequate datasets, we use the NLLB translator to create synthetic multilingual benchmarks from the existing 4 English reasoning benchmarks such as GSM8K and AQuA. The synthetic benchmarks are translated back into English for quality assurance with manual annotation. Interestingly, the place for injecting the dictionary plays an important factor in the performance gains, and we found that interleaving the dictionary with the original words gives a better performance compared to prepending/appending the dictionary, under the same dictionary constructed.

Hongyuan Lu, Zixuan Li, Wai Lam• 2024

Related benchmarks

Task	Dataset	Result
Date Understanding	Date Understanding FLORES-200 10-languages	Performance (kaz_Cyrl)72.4	14
Math Reasoning	SVAMP	Kazakh (Cyrl) Accuracy78.33	7
Math Word Problem Solving	SVAMP 10 low-resourced languages FLORES-200 (test)	Kazakh (Cyrillic) Accuracy44	7
Mathematical Reasoning	SVAMP 10 low-resourced languages FLORES-200	Kazakh (Cyrl) Score7.33	7
Mathematical Reasoning	GSM8K FLORES-200 (10 low-resourced languages) (test)	Kazakh (Cyrl) Accuracy67.93	7
Sports Understanding	Sports Understanding 10 low-resourced languages FLORES-200	Kazakh (Cyrl) Score58.8	7
Date Understanding	FLORES-200 10 low-resourced languages	Performance Score (kaz_Cyrl)20.4	7

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord