Multilingual Constituency Parsing with Self-Attention and Pre-Training
About
We show that constituency parsing benefits from unsupervised pre-training across a variety of languages and a range of pre-training conditions. We first compare the benefits of no pre-training, fastText, ELMo, and BERT for English and find that BERT outperforms ELMo, in large part due to increased model capacity, whereas ELMo in turn outperforms the non-contextual fastText embeddings. We also find that pre-training is beneficial across all 11 languages tested; however, large model sizes (more than 100 million parameters) make it computationally expensive to train separate models for each language. To address this shortcoming, we show that joint multilingual pre-training and fine-tuning allows sharing all but a small number of parameters between ten languages in the final model. The 10x reduction in model size compared to fine-tuning one model per language causes only a 3.2% relative error increase in aggregate. We further explore the idea of joint fine-tuning and show that it gives low-resource languages a way to benefit from the larger datasets of other languages. Finally, we demonstrate new state-of-the-art results for 11 languages, including English (95.8 F1) and Chinese (91.8 F1).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Constituent Parsing | PTB (test) | F195.77 | 127 | |
| Phrase-structure parsing | PTB (§23) | F1 Score95.6 | 56 | |
| Constituent Parsing | CTB (test) | F1 Score92.14 | 45 | |
| Constituency Parsing | WSJ Penn Treebank (test) | F1 Score95.77 | 27 | |
| Constituency Parsing | CTB 5.1 (test) | F1 Score91.75 | 25 | |
| Constituency Parsing | CTB 5.0 (test) | F1 Score91.75 | 19 | |
| Constituency Parsing | Chinese Treebank 5.1 (test) | F1 Score91.75 | 13 | |
| Multilingual Constituency Parsing | SPMRL 2013 2014 (test) | French Score87.42 | 13 | |
| Constituency Parsing | French Treebank (FTB) SPMRL shared task (test) | F187.42 | 8 |