| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-context language understanding | InfiniteBench | En.Sum33.01 | 81 | |
| Long-context understanding | InfiniteBench v1 (test) | Dialogue20 | 31 | |
| Long-context understanding | InfiniteBench | Math Score (F)0.4771 | 22 | |
| Long-context language modeling | InfiniteBench (test) | En QA Score34.82 | 14 | |
| Key-Value Retrieval | InfiniteBench 8k | Accuracy96 | 12 | |
| Key-Value Retrieval | InfiniteBench 4k | Accuracy100 | 12 | |
| Key-Value Retrieval | InfiniteBench 16k | Accuracy (%)87 | 10 | |
| Code Debug | InfiniteBench Code Debug | Accuracy74.37 | 7 | |
| Long-context reasoning | InfiniteBench (test) | Reasoning Pa Score87.63 | 6 | |
| Long-context understanding | InfiniteBench (test) | En QA F136.7 | 6 | |
| Long context understanding | InfiniteBench En.MC | Accuracy83.4 | 5 | |
| Long-context language understanding | InfiniteBench | InfiniteBench QA (EN) Score7.84 | 4 | |
| Math Find | InfiniteBench | Performance (8k Context)37.14 | 3 | |
| KV | InfiniteBench | KV Retrieval Score (8k)6.2 | 3 | |
| Long-context Modeling | InfiniteBench | Decoding Speedup9 | 1 |