Multilingual Neural RST Discourse Parsing
About
Text discourse parsing plays an important role in understanding information flow and argumentative structure in natural language. Previous research under the Rhetorical Structure Theory (RST) has mostly focused on inducing and evaluating models from the English treebank. However, the parsing tasks for other languages such as German, Dutch, and Portuguese are still challenging due to the shortage of annotated data. In this work, we investigate two approaches to establish a neural, cross-lingual discourse parser via: (1) utilizing multilingual vector representations; and (2) adopting segment-level translation of the source content. Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing on all sub-tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| RST Parsing | English | Span Score87.8 | 6 | |
| RST Parsing | Portuguese (Pt) | Span Score86.5 | 5 | |
| RST Parsing | Spanish (Es) | Span Score0.879 | 5 | |
| RST Parsing | Dutch Nl | Span Score85.9 | 4 | |
| RST Parsing | Basque Eu | Span Score85.1 | 4 | |
| RST Parsing | German De | Span Score83.6 | 4 |