| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GSM8K (test) | Accuracy94.5 | 155 | 2d ago | ||
| GSM8K | S-GRPO | Accuracy93.8 | 126 | 3d ago | |
| MATH | MoCAN | Accuracy75.5 | 88 | 3d ago | |
| AMC | LLM-J | Accuracy80 | 70 | 2d ago | |
| JEEBench | APRM | Accuracy74.4 | 60 | 3d ago | |
| OlympiadBench | APRM | Accuracy90.7 | 54 | 3d ago | |
| MSVAMP (test) | Language Steering | Average Accuracy83.9 | 45 | 3d ago | |
| MATH500 | ReST-MCTS | Accuracy93.2 | 41 | 2d ago | |
| MATH 500 | DEPO | Accuracy94.4 | 38 | 2d ago | |
| AIME 2024 | ExOPD | Accuracy0.627 | 37 | 3d ago | |
| OlympiadB | APRM | Accuracy90.7 | 36 | 3d ago | |
| MATH 200 samples (test) | Plan & Solve Prompting | Accuracy75 | 36 | 3d ago | |
| AIME 2025 | ExOPD | Accuracy56.1 | 33 | 3d ago | |
| GSM Hard | FSLR | Accuracy66.9 | 31 | 3d ago | |
| MATH lighteval | During-task Accuracy98.4 | 29 | 3d ago | ||
| AIME 2024 | PDP | Final Accuracy100 | 28 | 3d ago | |
| AMC 2023 | DEPO | Accuracy90.5 | 26 | 3d ago | |
| BeyondBench Hard | EFFGEN | Accuracy58.86 | 25 | 3d ago | |
| BeyondBench Easy | EFFGEN | Accuracy96.67 | 25 | 3d ago | |
| OlympiadBench | DECS | Accuracy70.3 | 22 | 3d ago | |
| We-Math | ADHint | Pass@176.4 | 19 | 3d ago | |
| MathVista | GRPO | Pass@174.6 | 19 | 3d ago | |
| OlympiadBench | s1.1-7B | Pass@1 Accuracy48.2 | 19 | 3d ago | |
| Math Reasoning Tasks (MultiArith, GSM8K, AddSub, AQUA, SingleEq, SVAMP, MAWPS) (test) | S2FT | MultiArith99.7 | 17 | 3d ago | |
| Open-RS | Jensen-Shannon | Relative Overall Score98.47 | 16 | 3d ago |