| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GSM8K | GHG-TDA | Accuracy97.1 | 1,398 | 21d ago | |
| GSM8K (test) | Accuracy99 | 954 | 5d ago | ||
| MATH500 (test) | DeepSeek r1 | Accuracy97.3 | 895 | 20h ago | |
| MATH | CoD | Accuracy95.63 | 882 | 1mo ago | |
| GSM8K (test) | ReProbe | Accuracy98.8 | 816 | 25d ago | |
| MATH 500 | Qwen3-Base SAT | Accuracy (Acc)99.1 | 543 | 7d ago | |
| MATH | Self-Reminder | Accuracy94.2 | 535 | 3mo ago | |
| GSM8K | MARS | Accuracy98 | 499 | 2mo ago | |
| AIME 2024 | DC-Tail | Accuracy93.3 | 479 | 22h ago | |
| MathVista | Qwen-VL-7B-Chat | Score229.2 | 474 | 18h ago | |
| MATH 500 | SwiR | Accuracy98.4 | 442 | 2mo ago | |
| MATH (test) | IIPC | Overall Accuracy94.13 | 433 | 3mo ago | |
| SVAMP | GPT-4o + QuaSAR | Accuracy97 | 403 | 1mo ago | |
| GSM8k | Phi-4 pass@N (Upper Bound) | Accuracy100 | 388 | 14d ago | |
| MATH 500 | SFT+RL | Top-1 Accuracy95 | 384 | 5d ago | |
| MathVista | Accuracy89.2 | 382 | 11d ago | ||
| AIME 2024 | GPT-5-Mini-R | Accuracy94 | 370 | 2mo ago | |
| AMC | Qwen3-30B A3B-Instruct-2507 | Accuracy (%)95.18 | 368 | 11d ago | |
| GSM8K | Accuracy (GSM8K)97.8 | 358 | 3mo ago | ||
| MathQA | Accuracy98.84 | 354 | 8d ago | ||
| MATH | Accuracy96.67 | 338 | 2mo ago | ||
| GSM8K | SIGMA | Accuracy (Acc)96.81 | 337 | 1d ago | |
| CollegeMATH | Hermes@5+Majority | Accuracy85.2 | 327 | 1d ago | |
| AIME 24 | DDC | Accuracy93.3 | 318 | 7d ago | |
| AIME 2025 | PC-cubic | Accuracy96.7 | 311 | 20h ago |