| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reasoning evaluation | DialogSum | Reasoning99.1 | 33 | |
| Summarization | DIALOGSUM | ROUGE-232.34 | 17 | |
| Summarization | DialogSum 1.5k examples (val) | ROUGE-L39.1 | 11 | |
| Summarization | DIALOGSUM | Std Dev ROUGE-10.83 | 8 | |
| Dialogue Summarization | DialogSum | R-147.8 | 7 | |
| Dialogue Summarization | DialogSum 50 samples (test) | Informativeness4.03 | 3 |