| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Mathematical Reasoning | LMB (LiveMathBench) | Accuracy35.8 | 23 | |
| Mathematical Reasoning | LiveMathBench v202505 (test) | Avg@417.5 | 20 | |
| Mathematical Reasoning | LiveMathBench | Accuracy74.38 | 19 | |
| Mathematical Reasoning | LiveMathBench 202505 | Accuracy32.17 | 11 | |
| Math | LiveMathBench v202505 | Avg@4 Score38.87 | 8 | |
| Mathematical Reasoning | LiveMathBench (full) | Pass@177.54 | 6 | |
| Mathematical Reasoning | LiveMathBench v202505 | Avg@46.8 | 4 | |
| Math problem-solving | LiveMathBench | AIME 24 Score100 | 4 |