RST Parsing from Scratch
About
We introduce a novel top-down end-to-end formulation of document-level discourse parsing in the Rhetorical Structure Theory (RST) framework. In this formulation, we consider discourse parsing as a sequence of splitting decisions at token boundaries and use a seq2seq network to model the splitting decisions. Our framework facilitates discourse parsing from scratch without requiring discourse segmentation as a prerequisite; rather, it yields segmentation as part of the parsing process. Our unified parsing model adopts a beam search to decode the best tree structure by searching through a space of high-scoring trees. With extensive experiments on the standard English RST discourse treebank, we demonstrate that our parser outperforms existing methods by a good margin in both end-to-end parsing and parsing with gold segmentation. More importantly, it does so without using any handcrafted features, making it faster and easily adaptable to new languages and domains.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| RST Discourse Parsing | RST-DT Parseval (test) | Span (S) Score74.3 | 32 | |
| Discourse Parsing | RST-DT (test) | Speedup11.1 | 11 | |
| RST Parsing | RST-DT (test) | Span Score74.3 | 7 | |
| End-to-end RST parsing | RST-DT En (test) | Segmentation Score96.3 | 7 | |
| RST Parsing | English RST treebank (test) | Span Score68.4 | 4 |