| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Summarization | MultiNews LongBench (test) | ROUGE-L Score30.91 | 33 | |
| Summarization | MultiNews | F1 Score34.79 | 28 | |
| Summarization | MultiNews (test) | Comprehensiveness4.98 | 24 | |
| Long-context Text Summarization | MultiNews 128K context | ROUGE-L29.5 | 18 | |
| Multi-document Summarization | MultiNews | MAT1 | 15 | |
| Document Summarization | MultiNews | ASR87 | 14 | |
| Long-context summarization | MultiNews | MAT5.97 | 11 | |
| Summarization | MultiNews | ROUGE Score24.6 | 10 | |
| Summarization | MultiNews | ROUGE-1 Std Dev0.11 | 8 | |
| Watermarking | MultiNews | Generation Metric Score25.86 | 6 | |
| Text Summarization Hallucination Evaluation | MultiNews | Accuracy19 | 6 | |
| Latency Evaluation | MultiNews | End-to-End Latency3.86 | 6 | |
| Faithfulness discrimination | MultiNews | AUC0.755 | 4 | |
| Summarization | MultiNews | Accuracy25.3 | 2 |