| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | Math-domain reasoning benchmarks (MATH-500, Olympiad, Minerva, GSM8K, AMC, AIME24) (test) | Overall Score58.12 | 20 | |
| Mathematical Reasoning | Math-domain reasoning benchmarks (GSM8K, MATH, MathQA) MathPile (test) | GSM8K Accuracy49.36 | 8 |