InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

About

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https://aka.ms/infoxlm.

Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, Heyan Huang, Ming Zhou• 2020

Related benchmarks

Task	Dataset	Result
Natural Language Inference	XNLI (test)	Average Accuracy81.4	167
Cross-lingual Language Understanding	XTREME	XNLI Accuracy81.4	43
Natural Language Inference	XNLI 1.0 (test)	Accuracy (en)89.7	40
Question Answering	MLQA (test)	F1 Score73.6	35
Cross-lingual Question Answering	MLQA v1.0 (test)	F1 (es)75.1	34
Semantic Entity Recognition	FUNSD	EN Score68.52	31
Relation Extraction	FUNSD	EN Performance Score36.99	16
Cross-lingual sentence retrieval	Tatoeba Parallel 14 language pairs	--	14
Relation Extraction	XFUND v1.0 (test)	FUNSD Score0.3679	12
Semantic Entity Recognition	XFUND	Accuracy (ZH)88.68	12

Showing 10 of 20 rows

Other info

Code

Follow for update

@wizwand_team Discord