| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | Mathematical Reasoning Suite AIME 2024, AIME 2025, AMC 2023, HMMT Feb 2025, OlymMath (test) | Accuracy58.2 | 56 | |
| Mathematical Reasoning | Mathematical Reasoning Suite (AMC, AIME 2024, AIME 2025, Minerva, MATH, Olympiad) various (test val) | Average Score62.1 | 55 | |
| Mathematical Reasoning | Mathematical Reasoning Suite GSM8K, MATH, SVAMP, SimulEq, AQuA, SAT, MMLU | Accuracy (Aggregate)70.6 | 40 | |
| Mathematical Reasoning | Mathematical Reasoning Suite (AIME24, AMC23, MATH, MIN., OLY.) | AIME24 Score43.3 | 31 | |
| Mathematical Reasoning | Mathematical Reasoning Suite MATH 500, AIME 2024, AIME 2025, AMC 2023, Olympiad Bench | Average Score76.36 | 29 | |
| Mathematical Reasoning | Mathematical Reasoning Suite (AIME24, AIME25, MATH, MINERVA, GPQA, GSM8K) Standard (test) | AIME24 Score68.75 | 26 | |
| Mathematical Reasoning | Mathematical Reasoning Suite MathQA, GSM8K, AddSub, SingleEq, SVAMP | MathQA Accuracy91.88 | 24 | |
| Mathematical Reasoning | Mathematical Reasoning Suite (AIME24, Math500, Olympiad, AMC, Minerva) | AIME24 Score60.58 | 23 | |
| Mathematical Reasoning | Mathematical Reasoning Suite (Math500, Gaokao En, Olympiad, GSM8K, AMC23, AIME25, AIME24) | Math500 Score92.8 | 18 | |
| Mathematical reasoning | Mathematical reasoning suite AIME24, AMC23, GameOf24 (test) | AIME24 Accuracy40 | 17 | |
| Mathematical Reasoning | Mathematical Reasoning Suite Overall | Average Score63.9 | 16 | |
| Mathematical Reasoning | Mathematical Reasoning Suite AIME24, AIME25, AMC23, GSM8K, MATH500 | AIME 2024 Score25.9 | 15 | |
| Mathematical Reasoning | Mathematical Reasoning Suite (MATH500, Olympiad, AMC, AIME24, AIME25, GSM8K) pass@1 (test) | Average Score44.4 | 14 | |
| Mathematical Reasoning | Mathematical Reasoning Suite (A24, A25, AMC, MATH, Minerva) | A24 Score52.29 | 13 | |
| Mathematical Reasoning | Mathematical Reasoning Suite (AMC, Minerva, MATH, GSM8K, Olympiad, AIME25, AIME24) | Overall Average Score56.46 | 12 | |
| Mathematical Reasoning | Mathematical Reasoning Suite AIME25, HMMT-Feb, HMMT-Nov, MATH500, Minerva (test) | AIME25 Accuracy61.2 | 10 |