| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long document summarization | BookSum (test) | ROUGE 143.19 | 37 | |
| Summarization | BookSum (test) | Comp Score5 | 24 | |
| Document Summarization | BookSum | ROUGE-1 Score46.62 | 22 | |
| Long-context Input (Summarization) | BookSum | TPT (s)3.76 | 20 | |
| Spoofing Attack Robustness | BookSum | AUC0.9552 | 20 | |
| Paraphrase Attack Robustness | BookSum | AUC98.49 | 20 | |
| Summarization | BookSum Chapter Level | ROUGE-142.68 | 14 | |
| Language Modeling | BookSum | Perplexity19.35 | 13 | |
| Summarization Faithfulness | BookSum | SummaC Score39.84 | 12 | |
| Abstractive Summarization | BookSum sampled (test) | ROUGE Score17.71 | 12 | |
| Faithfulness Evaluation | BookSum (test) | SummaC40.71 | 12 | |
| Grounded Payoff Tracking | BookSum | Detection Accuracy69.8 | 12 | |
| Narrative Reasoning | BookSum oracle timing | Average Score94 | 12 | |
| Watermarking Efficiency | BookSum | Total Time (s)1,224.25 | 10 | |
| Summarization | BookSum Average latest (test) | Average ROUGE17.47 | 6 | |
| Summarization | BookSum Trun. latest (test) | Avg ROUGE16.68 | 6 | |
| Summarization | BookSum No Trun. latest (test) | Average ROUGE17.74 | 6 |