Toward Robust Multilingual Adaptation of LLMs for Low-Resource Languages

About

Large language models (LLMs) continue to struggle with low-resource languages, primarily due to limited training data, translation noise, and unstable cross-lingual alignment. To address these challenges, we propose LiRA (Linguistic Robust Anchoring for LLMs)-a plug-and-play framework that requires only lightweight fine-tuning on top of existing pretrained backbones. LiRA jointly optimizes representation stability and cross-lingual semantic consistency by combining two key components: Arca (Anchored Representation Composition Architecture), which aligns low-resource inputs to a shared English semantic space through anchor-based alignment and collaborative encoding; and LaSR (Language-coupled Semantic Reasoner), a lightweight, language-aware head that enforces consistency regularization for unified cross-lingual understanding, retrieval, and reasoning. We theoretically show that under controlled anchoring error and translation-induced bias, LiRA guarantees bounded representation deviation and stable downstream performance under local Lipschitz continuity. To facilitate research, we release a new multilingual product retrieval dataset covering five Southeast Asian and two South Asian languages. Extensive experiments across diverse low-resource benchmarks demonstrate consistent improvements in retrieval, ranking, question answering, and reasoning tasks. Code will be publicly available on GitHub, and the dataset will be hosted on Hugging Face.

Haolin Li, Haipeng Zhang, Mang Li, Yaohua Wang, Lijie Wen, Yu Zhang, Biqing Huang• 2025

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	MGSM (test)	Accuracy (ZH)70.3	80
Retrieval	BelebeleRetrieval	nDCG@1087.03	26
Retrieval	LazRetrieval	BD Retrieval Score66.3	16
Information Retrieval	MLQA Retrieval	nDCG@1082.01	14
Semantic Textual Similarity	STS22	Pearson Correlation76.55	14
Multilingual Commonsense Reasoning	X-CSQA	Accuracy (SW)40.8	10

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord