| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Dialogue Summarization | SAMSum (test) | ROUGE-233 | 80 | |
| Abstractive Summarization | SAMSum | ROUGE-228.97 | 73 | |
| Abstractive dialogue summarization | SAMSum (test) | ROUGE-L52.7 | 53 | |
| Few-shot Learning | SAMSum | Score41.62 | 40 | |
| Summarization | SAMSum Full 2019 | F1 Score37 | 30 | |
| Summarization | SAMSum | BERTScore F191.3 | 30 | |
| Factual Consistency Evaluation | SAMSum | Spearman Correlation46.7 | 30 | |
| Factual Consistency Evaluation | SamSum (test) | Pearson Correlation Coefficient44.6 | 22 | |
| Meeting Summarization | SamSum | HPI6.4347 | 22 | |
| Summarization | SAMSum | AlignScore89.5 | 19 | |
| Summarization | SamSum (test) | ROUGE-153.4 | 18 | |
| Language Modeling | SAMSum | Perplexity31.18 | 13 | |
| Summarization Faithfulness | SAMSum | SummaC41.08 | 12 | |
| Abstractive Summarization | SAMSum sampled (test) | ROUGE Score26.88 | 12 | |
| Faithfulness Evaluation | SAMSum (test) | SummaC29.58 | 12 | |
| Summarization | SAMSum | Completeness4.98 | 12 | |
| Summarization | SAMSum | ROUGE-L31.46 | 12 | |
| Dialogue Summarization | SAMSum 1.0 (test) | R151 | 11 | |
| Output OOD Detection | Samsum | AUROC99.99 | 10 | |
| Dialogue Summarization | SAMSum | ROUGE-229.88 | 10 | |
| Summarization | Samsum | PPL4.02 | 9 | |
| Input OOD Detection | Samsum | AUROC1 | 8 | |
| Factual Consistency Evaluation | SamSum | Pearson Correlation Coefficient47.7 | 8 | |
| Factual Consistency Evaluation | SamSum | Kendall's Tau38.2 | 8 | |
| Abstractive Summarization | SAMSum (val) | ROUGE-153.8 | 8 |