How multilingual is Multilingual BERT?

About

In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiments, showing that transfer is possible even to languages in different scripts, that transfer works best between typologically similar languages, that monolingual corpora can train models for code-switching, and that the model can find translation pairs. From these results, we can conclude that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs.

Telmo Pires, Eva Schlinger, Dan Garrette• 2019

Related benchmarks

Task	Dataset	Result
Natural Language Inference	XNLI (test)	Average Accuracy62.09	167
Clause Classification	Illegal Clauses	Macro F181	63
Named Entity Recognition	CoNLL NER 2002/2003 (test)	German F1 Score69.74	59
Named Entity Recognition	WikiAnn (test)	Average Accuracy68.51	58
Clause Classification	Dark Clauses	Macro F175	23
Clause Classification	Gray Clauses	Macro F169	20
Natural Language Inference and Sentiment Analysis	GLUECoS (test)	NLI Accuracy0.5974	6
Review Rating Classification	Amazon Reviews en, es, fr	Accuracy (de)50.08	6
Review Rating Classification	Amazon Reviews en ja zh	Acc (de)0.4946	6

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord