XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation
About
In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE(Wang et al., 2019), which is labeled in English for natural language understanding tasks only, XGLUE has two main advantages: (1) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (2) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder(Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Inference | XNLI (test) | Average Accuracy75.3 | 167 | |
| Cross-lingual Question Answering | MLQA v1.0 (test) | F1 (es)68.6 | 34 | |
| Named Entity Recognition | XGLUE (test) | Score (de)70.4 | 6 | |
| Part-of-Speech Tagging | XGLUE 1.0 (test) | AR Accuracy67.3 | 6 |