| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Natural Language Generation | E2E (test) | ROUGE-L89.94 | 79 | |
| Data-to-text generation | E2E | ROUGE-L0.717 | 36 | |
| Data-to-Text Generation | E2E (test) | BLEU68.23 | 33 | |
| Dialogue Generation | E2E | BLEU64.81 | 10 | |
| Hallucination Detection | E2E (test) | F1-R90 | 10 | |
| Natural Language Generation | E2E en | ROUGE-245.8 | 9 | |
| Data-to-text generation | Cleaned E2E (test) | BLEU44.15 | 9 | |
| Data-to-Text Generation | E2E | ROUGE-237.6 | 8 | |
| Natural Language Generation | E2E | BLEU70.4 | 7 | |
| Vulnerability Attack Performance | E2E Cross-domain (dev) | ASR (E-commerce) (CodeQL)0.8333 | 6 | |
| Vulnerability Attack Performance | E2E (dev) | ASR (Semgrep) - E-commerce100 | 6 | |
| Span-level classification | E2E (test) | F1 Score (Span)56 | 6 | |
| Generation from meaning representations | E2E (test) | BLEU0.686 | 6 | |
| Semantic Content Control | E2E (test) | Control Score89.9 | 5 | |
| Error Tracing | E2E Synthetic Hallucination | auPR (England-China)94.14 | 5 | |
| Hallucination Retrieval | E2E dataset | AuPR71.6 | 5 | |
| Controllable Text Generation | E2E Syntax Spans | Ctrl95.33 | 5 | |
| Controllable Text Generation | E2E Semantic Content | Ctrl85.06 | 5 | |
| Factual Correctness | E2E Original (test) | Add0.14 | 5 | |
| Data-to-Text Generation | E2E Cleaned | Fluency5.46 | 5 | |
| Drought Impact Extraction | E2E dataset | Accuracy78.8 | 4 | |
| Natural Language Generation | E2E | GFLOPs per token4.65 | 4 | |
| Data-to-Text Generation | E2E clean MR | BLEU22.6 | 4 | |
| Data-to-text generation | E2E+ | BLEU0.6292 | 3 | |
| Data-To-Text | GEM E2E en (test) | ROUGE-2- | 0 |