| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling | GovReport | Perplexity2.5 | 50 | |
| Document Summarization | GovReport (test) | ROUGE-175.56 | 50 | |
| Long-context Input (Summarization) | GovReport | Time Per Token (TPT)6.62 | 20 | |
| Summarization | GovReport (val) | ROUGE-L47.2 | 17 | |
| Document Summarization | GovReport | ROUGE-160.64 | 15 | |
| Summarization | GovReport | Token MAE17.4 | 14 | |
| Long-Context Summarization | GovReport (test) | ROUGE-116.11 | 10 | |
| Long-Context Summarization | GovReport | ROUGE-1 Score17.23 | 10 | |
| Long-context Question Answering | GovReport | C Score68.4 | 9 | |
| Long-context answering with citations | GovReport | Citation Recall82.8 | 9 | |
| Summarization | GovReport | BERTScore0.92 | 7 | |
| Summarization | GovReport (test) | ROUGE-10.555 | 4 | |
| Summarization Evaluation (Human Correlation) | GovReport (test) | Relevance61 | 4 | |
| Pairwise Preference Evaluation | GovReport | Pairwise Win Rate52.9 | 3 | |
| Summarization | GovReport LongBench (test) | ROUGE-1 Score30.91 | 3 | |
| Model Ranking | GovReport ROUGE-L (test) | Kendall Tau (τ)0.823 | 2 |