XITE: Cross-lingual Interpolation for Transfer using Embeddings

About

Facilitating cross-lingual transfer in multilingual language models remains a critical challenge. Towards this goal, we propose an embedding-based data augmentation technique called XITE. We start with unlabeled text from a low-resource target language, identify an English counterpart in a task-specific training corpus using embedding-based similarities and adopt its label. Next, we perform a simple interpolation of the source and target embeddings to create synthetic data for task-specific fine-tuning. Projecting the target text into a language-rich subspace using linear discriminant analysis (LDA), prior to interpolation, further boosts performance. Our cross-lingual embedding-based augmentation technique XITE yields significant improvements of up to 35.91% for sentiment analysis and up to 81.16% for natural language inference, using XLM-R, for a diverse set of target languages including Korean, Arabic, Urdu and Hindi. Apart from boosting cross-lingual transfer, adaptation using XITE also safeguards against forgetting and maintains task performance on the high-resource language.

Barah Fazili, Preethi Jyothi• 2026

Related benchmarks

Task	Dataset	Result
Natural Language Inference	Natural Language Inference (NLI) (test)	Accuracy57.56	39
Natural Language Inference	XNLI Ur (dev)	Accuracy70.6	26
Natural Language Inference	XNLI Hi (dev)	Accuracy57.35	26
Natural Language Inference	XNLI Ur (test)	Accuracy0.4992	26
Natural Language Inference	English NLI (dev)	Accuracy62.73	16
Natural Language Inference	Korean NLI (dev)	Accuracy83.09	12
Sentiment Analysis	English Sentiment Analysis en, binary (dev)	Accuracy90.94	8
Natural Language Inference	XNLI Arabic (dev)	Accuracy68.35	8
Natural Language Inference	XNLI Arabic (test)	Accuracy62.2	8
Natural Language Inference	XNLI Korean (test)	Accuracy65.35	8

Showing 10 of 34 rows

Other info

Follow for update

@wizwand_team Discord