| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Abstractive Summarization | CNN/DailyMail full length F-1 (test) | ROUGE-141.69 | 48 | |
| Open ended generation | CNN DailyMail | ROUGE-L24.3 | 40 | |
| Language Generation | CNN/DailyMail | Accuracy27.16 | 35 | |
| Uncertainty Quantification | CNN/DailyMail | Hamming AUC0.745 | 28 | |
| Summarization | CNN/DailyMail | Hamming Score-0.276 | 28 | |
| Abstractive Summarization | CNN/DailyMail | ROUGE-144.51 | 25 | |
| Summarization | CNN/DailyMail (test) | 1st Metric44.16 | 22 | |
| Abstractive Summarization | CNN/DailyMail Summarization | Hamming Distance1.597 | 20 | |
| Length-Constrained Text Generation | CNN/DailyMail | Win Rate16.43 | 10 | |
| Text Generation | CNN/DailyMail (test) | LCTG Error Rate (E)3.18 | 10 | |
| Text Summarization | CNN/DailyMail (test) | ROUGE-133.23 | 9 | |
| Context Attribution | CNN Dailymail (1000 examples) | Log Probability Drop1.48 | 9 | |
| News Summarization | CNN DailyMail | BLEU5.41 | 8 | |
| Extractive Summarization | CNN/DailyMail anonymized (test) | ROUGE-142.69 | 8 | |
| Text Summarization | CNN DailyMail | ROUGE-138.58 | 7 | |
| Summarization | CNN/DailyMail | Distribution Time (s)4.68 | 7 | |
| Summarization | CNN/DailyMail 50 document sample (sampled) | PPL0.3 | 7 | |
| Summarization | CNN/DailyMail (evaluation) | ROUGE-144.45 | 7 | |
| Text Summarization | CNN/DailyMail 100-example subset (ACU protocol) (test) | ACU0.4421 | 6 | |
| Summarization | CNN/DailyMail human evaluation (100 samples) | Relevance Score43 | 6 | |
| Abstractive Summarization | CNN/DailyMail | Baseline Throughput (samples/s)3.4 | 5 | |
| Abstractive Text Summarization | CNN/DailyMail | QA Score56.1 | 4 | |
| Summarization | CNN/DailyMail random subset | Non-Redundancy159 | 4 | |
| Entailment Classification | CNN DailyMail (test) | Avg Entailment Probability91.2 | 2 |