| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-context language understanding | InfiniteBench | En.Sum32.93 | 63 | |
| Long-context understanding | InfiniteBench v1 (test) | Dialogue20 | 31 | |
| Long-context understanding | InfiniteBench | En. MC Accuracy0.6812 | 12 | |
| Key-Value Retrieval | InfiniteBench 8k | Accuracy96 | 12 | |
| Key-Value Retrieval | InfiniteBench 4k | Accuracy100 | 12 | |
| Long-context language modeling | InfiniteBench (test) | En Sum Score1 | 10 | |
| Key-Value Retrieval | InfiniteBench 16k | Accuracy (%)87 | 10 | |
| Long-context reasoning | InfiniteBench (test) | Reasoning Pa Score87.63 | 6 | |
| Long-context understanding | InfiniteBench (test) | En QA F136.7 | 6 | |
| Long context understanding | InfiniteBench En.MC | Accuracy83.4 | 5 | |
| Long-context language understanding | InfiniteBench | InfiniteBench QA (EN) Score7.84 | 4 | |
| Math Find | InfiniteBench | Performance (8k Context)37.14 | 3 | |
| KV | InfiniteBench | KV Retrieval Score (8k)6.2 | 3 |