| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| XSum (test) | ROUGE-260.61 | 231 | 2d ago | ||
| arXiv (test) | Top Down Transformer | ROUGE-164.16 | 161 | 3d ago | |
| Xsum | ST-MoE | ROUGE-227.1 | 108 | 3d ago | |
| PubMed (test) | ORACLE | ROUGE-161.99 | 107 | 3d ago | |
| Arxiv | ROUGE-223.05 | 76 | 3d ago | ||
| PubMed | LongT5 | ROUGE-150.23 | 70 | 3d ago | |
| CNN Daily Mail | PEGASUS-2B (calibrated) | ROUGE-147.97 | 67 | 2d ago | |
| bigPatent | OracleFrag | ROUGE-191.85 | 61 | 3d ago | |
| CNN/DM | ROUGE-156.22 | 56 | 3d ago | ||
| CNN/Daily Mail original, non-anonymized (test) | Best Previous Abstractive | ROUGE-141.69 | 54 | 3d ago | |
| TL;DR (test) | GRPO | Win Rate82.5 | 49 | 3d ago | |
| Newsroom (test) | TLM+E (G,G) | ROUGE-274 | 40 | 3d ago | |
| Gigaword (test) | Aghajanyan et al. | ROUGE-220.7 | 38 | 3d ago | |
| Gigaword | UNIMO | ROUGE-L36.88 | 38 | 2d ago | |
| Newsroom (test) | MARS (default) | Pearson Correlation0.372 | 36 | 3d ago | |
| CNN/DM | DOUBLE | M Score10.64 | 35 | 3d ago | |
| CNN/DM | TALON | Speedup3.58 | 32 | 3d ago | |
| SAMSum Full 2019 | CIPHER | F1 Score37 | 30 | 3d ago | |
| SAMSum | CriSPO | BERTScore F191.3 | 30 | 3d ago | |
| eBay Email | Llama 3.1 8B | Mean Auto-Rating (r)2.062 | 29 | 3d ago | |
| eBay WebForm | Llama 3.1 8B | Mean Auto-Rating (r)4.325 | 29 | 3d ago | |
| eBay Teammate Chat | Llama 3.2 3B | Mean Auto-Rating (r)4.47 | 29 | 3d ago | |
| eBay Bot Chat | Gemma3 1B | Mean Auto-Rating (r)3.52 | 29 | 3d ago | |
| CNN/DailyMail | KLE | Hamming Score-0.276 | 28 | 3d ago | |
| BillSum | Dense Model | Accuracy69.6 | 28 | 3d ago |