| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | GovReport | Perplexity2.5 | 50 | |
| Document Summarization | GovReport (test) | ROUGE-175.56 | 50 | |
| Long-context language generation | GovReport | Speedup (×)3.36 | 32 | |
| Long-context Input (Summarization) | GovReport | Speedup3.09 | 31 | |
| Summarization | GovReport | ROUGE Score46 | 30 | |
| Summarization | GovReport (val) | ROUGE-L47.2 | 17 | |
| Document Summarization | GovReport | ROUGE-160.64 | 15 | |
| Document Summarization | GovReport | ASR Score89 | 14 | |
| Summarization | GovReport | Token MAE17.4 | 14 | |
| Context Compression | GovReport | ROUGE-161.39 | 13 | |
| Summarization | GovReport (test) | ROUGE-10.555 | 13 | |
| Summarization | GovReport 16K | 95th Percentile Latency (ms)118 | 12 | |
| Long-Context Summarization | GovReport | ROUGE-1 Score22.03 | 12 | |
| Prompt Injection Attack | GovReport | Attack Success Rate (ASR)0 | 11 | |
| Long-Context Summarization | GovReport (test) | ROUGE-116.11 | 10 | |
| Plain Summarization | GovReport | ROUGE-146.93 | 9 | |
| Document Summarization | GovReport | G-mean18.43 | 9 | |
| Long-context Question Answering | GovReport | C Score68.4 | 9 | |
| Long-context answering with citations | GovReport | Citation Recall82.8 | 9 | |
| Summarization | GovReport | BERTScore0.92 | 7 | |
| Extractive Summarization | GovReport (GR) | Accuracy75.3 | 5 | |
| Text Simplification | GovReport 500 samples (human evaluation) | Coherence4.27 | 4 | |
| Prompt Injection Attack | GovReport | ASR100 | 4 | |
| Summarization Evaluation (Human Correlation) | GovReport (test) | Relevance61 | 4 | |
| Summarization | GovReport SCROLLS (test) | ROUGE-148.18 | 3 |