| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GSM8K | GHG-TDA | Accuracy97.1 | 983 | 2d ago | |
| GSM8K (test) | Accuracy99 | 797 | 2d ago | ||
| GSM8K (test) | Accuracy97.72 | 751 | 3d ago | ||
| MATH | CoD | Accuracy95.63 | 643 | 2d ago | |
| MATH | Self-Reminder | Accuracy94.2 | 535 | 3d ago | |
| MATH (test) | IIPC | Overall Accuracy94.13 | 433 | 3d ago | |
| MATH500 (test) | DeepSeek r1 | Accuracy97.3 | 381 | 2d ago | |
| SVAMP | GPT-4o + QuaSAR | Accuracy97 | 368 | 2d ago | |
| GSM8K | Accuracy (GSM8K)97.8 | 358 | 3d ago | ||
| GSM8K | SC | Accuracy97.04 | 351 | 3d ago | |
| MathVista | Qwen-VL-7B-Chat | Score229.2 | 322 | 3d ago | |
| AIME | STAR-1 | AIME Accuracy83.3 | 283 | 2d ago | |
| AIME 2024 | Agentic Proposing | Accuracy93.5 | 251 | 2d ago | |
| SVAMP (test) | Self-Contrast | Accuracy94 | 233 | 3d ago | |
| AIME 2025 | Accuracy95 | 227 | 3d ago | ||
| ASDiv | Accuracy0.955 | 221 | 3d ago | ||
| MAWPS | Accuracy98.5 | 219 | 2d ago | ||
| GSM8k | SQ-format | Accuracy96.21 | 212 | 2d ago | |
| AIME 25 | APRM | Accuracy94.5 | 201 | 3d ago | |
| AMC 23 | Gemini 2.5 pro | Accuracy100 | 198 | 3d ago | |
| GSM8K | GKT | Speed Up (x)10.72 | 177 | 3d ago | |
| GSM8K | Llama-3.3-70B | Math Score96.4 | 171 | 2d ago | |
| GSM-Hard | BRAID | Accuracy99 | 169 | 3d ago | |
| GSM-Hard | GPT-4o | Solve Rate78 | 162 | 3d ago | |
| MATH | Accuracy96.67 | 162 | 3d ago |