| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Long-context Reasoning | Long-context Benchmarks 100K context LB-V2 DocMath Frames LB-MQA (test) | DocMath Score66.7 | 36 | |
| Long-context Reasoning | Long-context Benchmarks 16K context DocMath Frames LB-MQA V2 (test) | DocMath64.1 | 36 | |
| Long Context Evaluation | Long Context Benchmarks | MDQA-10 Score32.3 | 5 |