XITE: Cross-lingual Interpolation for Transfer using Embeddings
About
Facilitating cross-lingual transfer in multilingual language models remains a critical challenge. Towards this goal, we propose an embedding-based data augmentation technique called XITE. We start with unlabeled text from a low-resource target language, identify an English counterpart in a task-specific training corpus using embedding-based similarities and adopt its label. Next, we perform a simple interpolation of the source and target embeddings to create synthetic data for task-specific fine-tuning. Projecting the target text into a language-rich subspace using linear discriminant analysis (LDA), prior to interpolation, further boosts performance. Our cross-lingual embedding-based augmentation technique XITE yields significant improvements of up to 35.91% for sentiment analysis and up to 81.16% for natural language inference, using XLM-R, for a diverse set of target languages including Korean, Arabic, Urdu and Hindi. Apart from boosting cross-lingual transfer, adaptation using XITE also safeguards against forgetting and maintains task performance on the high-resource language.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Inference | Natural Language Inference (NLI) (test) | Accuracy57.56 | 39 | |
| Natural Language Inference | XNLI Ur (dev) | Accuracy70.6 | 26 | |
| Natural Language Inference | XNLI Hi (dev) | Accuracy57.35 | 26 | |
| Natural Language Inference | XNLI Ur (test) | Accuracy0.4992 | 26 | |
| Natural Language Inference | English NLI (dev) | Accuracy62.73 | 16 | |
| Natural Language Inference | Korean NLI (dev) | Accuracy83.09 | 12 | |
| Sentiment Analysis | English Sentiment Analysis en, binary (dev) | Accuracy90.94 | 8 | |
| Natural Language Inference | XNLI Arabic (dev) | Accuracy68.35 | 8 | |
| Natural Language Inference | XNLI Arabic (test) | Accuracy62.2 | 8 | |
| Natural Language Inference | XNLI Korean (test) | Accuracy65.35 | 8 |