| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| XSum (test) | ROUGE-260.61 | 276 | 4d ago | ||
| arXiv (test) | Top Down Transformer | ROUGE-164.16 | 161 | 3mo ago | |
| PubMed (test) | ORACLE | ROUGE-161.99 | 114 | 20d ago | |
| Xsum | ST-MoE | ROUGE-227.1 | 108 | 3mo ago | |
| Arxiv | ROUGE-223.05 | 76 | 3mo ago | ||
| FeedSum | GPT4o | Scom4.09 | 72 | 1mo ago | |
| PubMed | LongT5 | ROUGE-150.23 | 70 | 3mo ago | |
| CNN Daily Mail | PEGASUS-2B (calibrated) | ROUGE-147.97 | 67 | 3mo ago | |
| SamSum | PRR-0.113 | 66 | 1mo ago | ||
| XSum | TAD | PRR0.617 | 66 | 1mo ago | |
| CNNDM | Diversed | ROUGE-212.64 | 62 | 1mo ago | |
| bigPatent | OracleFrag | ROUGE-191.85 | 61 | 3mo ago | |
| TL;DR | SignCert-PO | Winrate91.8 | 59 | 6d ago | |
| CNN/DM | ROUGE-156.22 | 56 | 3mo ago | ||
| CNN/Daily Mail original, non-anonymized (test) | Best Previous Abstractive | ROUGE-141.69 | 54 | 3mo ago | |
| LongBench | GovRep Score33.39 | 51 | 2mo ago | ||
| TL;DR (test) | GRPO | Win Rate82.5 | 49 | 3mo ago | |
| XSum | ROUGE-29.16 | 46 | 1mo ago | ||
| CNN | TAD | PRR0.444 | 44 | 1mo ago | |
| XSum | ZO-Adam | ROUGE-L27.45 | 42 | 7d ago | |
| Newsroom (test) | TLM+E (G,G) | ROUGE-274 | 40 | 3mo ago | |
| Summarization | Curriculum-RLAIF | Win Rate95 | 39 | 1mo ago | |
| LongBench GovReport | ROUGE-L34.61 | 38 | 11d ago | ||
| Gigaword (test) | Aghajanyan et al. | ROUGE-220.7 | 38 | 3mo ago | |
| Gigaword | UNIMO | ROUGE-L36.88 | 38 | 3mo ago |