Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Toward Robust Multilingual Adaptation of LLMs for Low-Resource Languages

About

Large language models (LLMs) continue to struggle with low-resource languages, primarily due to limited training data, translation noise, and unstable cross-lingual alignment. To address these challenges, we propose LiRA (Linguistic Robust Anchoring for LLMs)-a plug-and-play framework that requires only lightweight fine-tuning on top of existing pretrained backbones. LiRA jointly optimizes representation stability and cross-lingual semantic consistency by combining two key components: Arca (Anchored Representation Composition Architecture), which aligns low-resource inputs to a shared English semantic space through anchor-based alignment and collaborative encoding; and LaSR (Language-coupled Semantic Reasoner), a lightweight, language-aware head that enforces consistency regularization for unified cross-lingual understanding, retrieval, and reasoning. We theoretically show that under controlled anchoring error and translation-induced bias, LiRA guarantees bounded representation deviation and stable downstream performance under local Lipschitz continuity. To facilitate research, we release a new multilingual product retrieval dataset covering five Southeast Asian and two South Asian languages. Extensive experiments across diverse low-resource benchmarks demonstrate consistent improvements in retrieval, ranking, question answering, and reasoning tasks. Code will be publicly available on GitHub, and the dataset will be hosted on Hugging Face.

Haolin Li, Haipeng Zhang, Mang Li, Yaohua Wang, Lijie Wen, Yu Zhang, Biqing Huang• 2025

Related benchmarks

TaskDatasetResultRank
Mathematical ReasoningMGSM (test)
Accuracy (ZH)70.3
80
RetrievalBelebeleRetrieval
nDCG@1087.03
26
RetrievalLazRetrieval
BD Retrieval Score66.3
16
Information RetrievalMLQA Retrieval
nDCG@1082.01
14
Semantic Textual SimilaritySTS22
Pearson Correlation76.55
14
Multilingual Commonsense ReasoningX-CSQA
Accuracy (SW)40.8
10
Showing 6 of 6 rows

Other info

Follow for update