| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MATH | DeepSeek-R1 | Accuracy97.6 | 166 | 3d ago | |
| AIME 2024 | Accuracy100 | 60 | 3d ago | ||
| AIME 25 | Accuracy93.3 | 54 | 3d ago | ||
| AIME | DeepSeek-R1 | AIME Score86.7 | 35 | 3d ago | |
| Gaokao MathQA | Qwen2.5-Math-72B | Accuracy86.3 | 30 | 3d ago | |
| MinervaMath (test) | PASER | Accuracy21.2 | 28 | 3d ago | |
| MathVerse (testmini) | Accuracy64.9 | 28 | 3d ago | ||
| MATH-Vision (test) | Accuracy68.8 | 26 | 3d ago | ||
| MATH (test) | Gemini-Ultra | Accuracy53.2 | 25 | 3d ago | |
| MATH eval (test) | TROLL | Success Rate59.1 | 20 | 3d ago | |
| MATH | DeepSeekMath-Base | Overall Accuracy0.362 | 20 | 3d ago | |
| MATH | Accuracy @ t147.4 | 18 | 3d ago | ||
| Gaokao MathCloze | Accuracy72.9 | 18 | 3d ago | ||
| AIME VeRA-H Pro 2024-II | Avg@5 Accuracy78.6 | 16 | 3d ago | ||
| AIME VeRA-H 2024-II | Avg@5 (%)0.909 | 16 | 3d ago | ||
| AIME Seeds 2024 II | Avg@5 (%)94.3 | 16 | 3d ago | ||
| MATH 519 problems (test) | SC-MAS | Accuracy76.75 | 16 | 3d ago | |
| MATH standard (val) | PiSSA | Accuracy31.33 | 15 | 3d ago | |
| HMMT February 2025 | pass@197.5 | 13 | 3d ago | ||
| AIME 2025 | gpt-oss-120b | Score91.7 | 13 | 3d ago | |
| MATH | Accuracy95.7 | 13 | 3d ago | ||
| TheoremQA TQ-Math | DeepMath (RULES) | Exact Match Accuracy57.7 | 12 | 3d ago | |
| MATH-Perturb MP-simple | DeepMath (GRPO) | Exact Match Accuracy72.8 | 12 | 3d ago | |
| MATH-Perturb MP-hard | DeepMath (RULES) | Exact Match Accuracy56.3 | 12 | 3d ago | |
| IneqMath (IM) | DeepMath (vanilla) | Exact Match Accuracy76 | 12 | 3d ago |