Discourse Coherence in the Wild: A Dataset, Evaluation and Methods
About
To date there has been very little work on assessing discourse coherence methods on real-world data. To address this, we present a new corpus of real-world texts (GCDC) as well as the first large-scale evaluation of leading discourse coherence algorithms. We show that neural models, including two that we introduce here (SentAvg and ParSeq), tend to perform best. We analyze these performance differences and discuss patterns we observed in low coherence texts in four domains.
Alice Lai, Joel Tetreault• 2018
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Coherence classification | GCDC 1.0 (test) | Clinton F161 | 26 | |
| Discourse Coherence Classification | GCDC Yahoo 1.0 (test) | Accuracy54.9 | 21 | |
| Discourse Coherence Classification | GCDC Enron 1.0 (test) | Accuracy56.5 | 21 | |
| Discourse Coherence Classification | GCDC Yelp 1.0 (test) | Accuracy57.5 | 21 | |
| Discourse Coherence Classification | GCDC Clinton 1.0 (test) | Accuracy60.2 | 21 | |
| Sentence ordering | GCDC 1.0 (test) | Yahoo Accuracy58.3 | 13 | |
| Sentence ordering | WSJ (test) | PRA74.1 | 13 | |
| 2-way classification | GCDC | Yahoo Score48.1 | 12 | |
| Coherence Score Prediction | GCDC | Yahoo Coherence Score0.519 | 12 | |
| Coherence classification | GCDC | Coherence Score (Clinton)61.05 | 9 |
Showing 10 of 19 rows