| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | Mathematical Reasoning Benchmarks (GSM8K, MATH, AMC23, Olympiad, Minerva) (test) | GSM8K Accuracy94 | 32 | |
| Mathematical Reasoning | Mathematical Reasoning Benchmarks AIME24, MATH500, Olympiad, AMC, Minerva (test) | AIME24 Score33.3 | 22 | |
| Mathematical Reasoning | Mathematical Reasoning Benchmarks (AddSub, AQuA, GSM8k, MultiArith, SingleEq, SVAMP) (test) | Accuracy (AddSub)96.96 | 18 | |
| Mathematical Reasoning | Mathematical Reasoning Benchmarks | AIME 24 Score59.8 | 16 | |
| Mathematical Reasoning | Mathematical Reasoning Benchmarks (test) | Average Score53.72 | 16 | |
| Mathematical Reasoning | Mathematical Reasoning Benchmarks Average | Pass@4 Score71.88 | 14 | |
| Mathematical Reasoning | Mathematical Reasoning Benchmarks AIME24, AMC23, MATH500 | AIME24 Score15 | 6 | |
| Mathematical Reasoning | Mathematical Reasoning Benchmarks (MATH, AIME25, AMC, MINERVA, KAOYAN, OLYMPIAD, CN_MATH24) | MATH Accuracy91.1 | 4 | |
| Mathematical Reasoning | Mathematical Reasoning Benchmarks AIME24, Math500, Olympiad, AMC, Minerva | AIME24 Score15.6 | 3 | |
| Mathematical Reasoning | Mathematical Reasoning Benchmarks AIME 24, AIME 25, AMC, Minerva, MATH-500, Olympiad, Gaokao23 (test) | AIME 2024 Score43.3 | 3 |