Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

XITE: Cross-lingual Interpolation for Transfer using Embeddings

About

Facilitating cross-lingual transfer in multilingual language models remains a critical challenge. Towards this goal, we propose an embedding-based data augmentation technique called XITE. We start with unlabeled text from a low-resource target language, identify an English counterpart in a task-specific training corpus using embedding-based similarities and adopt its label. Next, we perform a simple interpolation of the source and target embeddings to create synthetic data for task-specific fine-tuning. Projecting the target text into a language-rich subspace using linear discriminant analysis (LDA), prior to interpolation, further boosts performance. Our cross-lingual embedding-based augmentation technique XITE yields significant improvements of up to 35.91% for sentiment analysis and up to 81.16% for natural language inference, using XLM-R, for a diverse set of target languages including Korean, Arabic, Urdu and Hindi. Apart from boosting cross-lingual transfer, adaptation using XITE also safeguards against forgetting and maintains task performance on the high-resource language.

Barah Fazili, Preethi Jyothi• 2026

Related benchmarks

TaskDatasetResultRank
Natural Language InferenceNatural Language Inference (NLI) (test)
Accuracy57.56
39
Natural Language InferenceXNLI Ur (dev)
Accuracy70.6
26
Natural Language InferenceXNLI Hi (dev)
Accuracy57.35
26
Natural Language InferenceXNLI Ur (test)
Accuracy0.4992
26
Natural Language InferenceEnglish NLI (dev)
Accuracy62.73
16
Natural Language InferenceKorean NLI (dev)
Accuracy83.09
12
Sentiment AnalysisEnglish Sentiment Analysis en, binary (dev)
Accuracy90.94
8
Natural Language InferenceXNLI Arabic (dev)
Accuracy68.35
8
Natural Language InferenceXNLI Arabic (test)
Accuracy62.2
8
Natural Language InferenceXNLI Korean (test)
Accuracy65.35
8
Showing 10 of 34 rows

Other info

Follow for update