| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Abstractive Summarization | CNN/DailyMail full length F-1 (test) | ROUGE-141.69 | 48 | |
| Open ended generation | CNN DailyMail | ROUGE-L24.3 | 40 | |
| Language Generation | CNN/DailyMail | Accuracy27.16 | 35 | |
| Summarization | CNN/DailyMail (test) | ROUGE-L48 | 33 | |
| Uncertainty Quantification | CNN/DailyMail | Hamming AUC0.745 | 28 | |
| Summarization | CNN/DailyMail | Hamming Score-0.276 | 28 | |
| Abstractive Summarization | CNN/DailyMail | ROUGE-144.51 | 25 | |
| Summarization | CNN/DailyMail | RougeL23.23 | 21 | |
| Abstractive Summarization | CNN/DailyMail Summarization | Hamming Distance1.597 | 20 | |
| Reranking | CNN DailyMail | R-152.43 | 15 | |
| Text Summarization | CNN DailyMail | ROUGE-138.58 | 13 | |
| Length-Constrained Text Generation | CNN/DailyMail | Win Rate16.43 | 10 | |
| Text Generation | CNN/DailyMail (test) | LCTG Error Rate (E)3.18 | 10 | |
| Text Summarization | CNN/DailyMail (test) | ROUGE-133.23 | 9 | |
| Context Attribution | CNN Dailymail (1000 examples) | Log Probability Drop1.48 | 9 | |
| News Summarization | CNN DailyMail | BLEU5.41 | 8 | |
| Extractive Summarization | CNN/DailyMail anonymized (test) | ROUGE-142.69 | 8 | |
| Summarization | CNN/DailyMail | Distribution Time (s)4.68 | 7 | |
| Summarization | CNN/DailyMail 50 document sample (sampled) | PPL0.3 | 7 | |
| Summarization | CNN/DailyMail (evaluation) | ROUGE-144.45 | 7 | |
| Text Summarization | CNN/DailyMail 100-example subset (ACU protocol) (test) | ACU0.4421 | 6 | |
| Summarization | CNN/DailyMail human evaluation (100 samples) | Relevance Score43 | 6 | |
| Hallucination Detection | CNN/DailyMail | AvgWD0.694 | 5 | |
| Abstractive Summarization | CNN/DailyMail | Baseline Throughput (samples/s)3.4 | 5 | |
| Summarization | CNN/DailyMail | R-1 Score37.44 | 4 |