| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-context Reasoning | Long-context Benchmarks 100K context LB-V2 DocMath Frames LB-MQA (test) | DocMath Score66.7 | 36 | |
| Long-context Reasoning | Long-context Benchmarks 16K context DocMath Frames LB-MQA V2 (test) | DocMath64.1 | 36 | |
| Fact chaining & relational reasoning | Long-context benchmarks | Accuracy (8k Context)52.8 | 21 | |
| Multi-round co-reference resolution | Long-context benchmarks | Score (8k Context)38.5 | 21 | |
| Passage re-ranking | Long-context benchmarks | Performance (8k Context)50.5 | 21 | |
| Synthetic recall | Long-context benchmarks | Synthetic Recall (8k context)100 | 21 | |
| Retrieval-Augmented Generation | Long-context benchmarks | RAG Score (8k Context)53.7 | 16 | |
| Long Context Evaluation | Long Context Benchmarks | MDQA-10 Score32.3 | 5 |