| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Summarization | Summarization | ROUGE-L26.77 | 18 | |
| Summarization | Summarization dataset | ROUGE-L F167.8 | 16 | |
| Summarization | Summarization | Edit Distance6,573 | 12 | |
| Summarization | Summarization (Weak User) | Mean Total Edit Distance0.2577 | 10 | |
| Summarization | Summarization (Strong User) | Mean Total Edit Distance0.1845 | 10 | |
| Summarization | Summarization | Rouge-L18.4 | 10 | |
| Summarization | Summarization | Grade73.18 | 6 | |
| Summarization | Summarization Human Evaluation (test) | Consistency4 | 6 | |
| Critique Quality Evaluation | Summarization | Win Rate75 | 6 | |
| Faithfulness | Summarization (test) | Reward-0.268 | 4 | |
| Label aggregation assessment | Summarization (test) | Test Accuracy68 | 4 | |
| Summarization | Summarization (test) | Rank 1 Frequency0.52 | 4 |