| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GSM8K | GHG-TDA | Accuracy97.1 | 1,362 | 4d ago | |
| GSM8K (test) | Accuracy99 | 900 | 5d ago | ||
| MATH | CoD | Accuracy95.63 | 882 | 4d ago | |
| GSM8K (test) | Accuracy97.72 | 770 | 9d ago | ||
| MATH | Self-Reminder | Accuracy94.2 | 535 | 1mo ago | |
| MATH500 (test) | DeepSeek r1 | Accuracy97.3 | 514 | 5d ago | |
| GSM8K | MARS | Accuracy98 | 499 | 24d ago | |
| MATH 500 | SwiR | Accuracy98.4 | 442 | 24d ago | |
| MATH (test) | IIPC | Overall Accuracy94.13 | 433 | 1mo ago | |
| SVAMP | GPT-4o + QuaSAR | Accuracy97 | 403 | 4d ago | |
| MathVista | Qwen-VL-7B-Chat | Score229.2 | 385 | 11d ago | |
| AIME 2024 | GPT-5-Mini-R | Accuracy94 | 370 | 24d ago | |
| GSM8K | Accuracy (GSM8K)97.8 | 358 | 1mo ago | ||
| MATH | Accuracy96.67 | 338 | 26d ago | ||
| GSM8k | SQ-format | Accuracy96.21 | 312 | 8d ago | |
| MathQA | Accuracy98.84 | 305 | 2d ago | ||
| AIME | STAR-1 | AIME Accuracy83.3 | 288 | 1mo ago | |
| CollegeMATH | Accuracy52.5 | 276 | 25d ago | ||
| SVAMP (test) | Self-Contrast | Accuracy94 | 262 | 12d ago | |
| MathVista | Accuracy89.2 | 257 | 4d ago | ||
| GSM8K | ES-dLLM | Speed Up (x)13.4 | 246 | 29d ago | |
| ASDiv | Accuracy0.955 | 245 | 5d ago | ||
| MATH 500 | r1 | pass@197.3 | 239 | 25d ago | |
| MAWPS | Accuracy98.5 | 234 | 25d ago | ||
| AIME 2025 | Accuracy95 | 227 | 1mo ago |