| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Computational and Knowledge-Intensive Reasoning Tasks (AIME24, AIME25, MATH500, GSM8K, MATH, WebWalker, HQA, 2Wiki., MuSiQ., Bamb.) latest (test) | RAPO | AIME 24 Score34.2 | 30 | 1mo ago | |
| Overall 9 Benchmarks | AutoTraj | Average Score88 | 9 | 1mo ago | |
| AIME 25 | Token-ALP | Test Accuracy31.25 | 6 | 26d ago | |
| AIME 24 | Seq-ALP | Test Accuracy43.85 | 6 | 26d ago | |
| Olympiad Bench | Seq-ALP | Test Accuracy52.75 | 6 | 26d ago | |
| Minerva Math | Token-ALP | Test Accuracy43.18 | 6 | 26d ago | |
| TIR-Bench | DeepEyesV2-RL | Score20.8 | 4 | 1mo ago |