| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MATH 500 | DeepSeek-R1 | Pass@197.3 | 95 | 1mo ago | |
| MATH | Qwen 3 VL 32B Instruct | MATH Accuracy95.1 | 85 | 4d ago | |
| AIME 2025 | Document RAG (Google) | Accuracy76.7 | 66 | 11d ago | |
| AIME25 | gpt-oss-puzzle-88B | Accuracy93.33 | 63 | 5d ago | |
| AIME 2024 | Reasoning Memory | Accuracy83.8 | 60 | 15d ago | |
| AMC | pass@187.95 | 53 | 1mo ago | ||
| AIME 2024 | DeepSeek-R1 | Pass@10.798 | 49 | 1mo ago | |
| GSM8K | Qwen2-72B-Instruct | GSM8K Score93.2 | 39 | 1mo ago | |
| OlympiadBench | INSIGHT | Pass@1 Accuracy57 | 32 | 1mo ago | |
| AIME 24 | INSIGHT | Pass@153.75 | 32 | 1mo ago | |
| GSM8K 0 (test) | ESPO | Accuracy83.7 | 32 | 1mo ago | |
| Base Aggregate Math (test) | Olmo 3 | Score69.7 | 32 | 1mo ago | |
| AIME 2024 | STEP3-VL-10B | AIME 2024 Score (%)90.94 | 31 | 18d ago | |
| Beyond | Agent Q-Mix | Accuracy42 | 26 | 16d ago | |
| HMMT | Agent Q-Mix | Accuracy53.33 | 26 | 16d ago | |
| AIME 26 | AutoGen | Accuracy70 | 26 | 16d ago | |
| AIME-25, HMMT25(Feb), BeyondAIME, AMO-Bench, IMO-AnswerBench | Pass@197.5 | 25 | 25d ago | ||
| AIME25 | Exact Match86.67 | 18 | 29d ago | ||
| Mathematics Suite | Olmo 3 7B-PC | GSM8K Accuracy73 | 18 | 1mo ago | |
| AIME 25 | Scaf-GRPO | Pass@123.3 | 18 | 1mo ago | |
| MATH | N-3-Super 120B-A12B-Base | Exact Match84.84 | 15 | 3d ago | |
| AIME 2024 | Cog-DRIFT | Accuracy51.74 | 14 | 11d ago | |
| Minerva Math | INSIGHT | Pass@1 Accuracy41.8 | 14 | 1mo ago | |
| AMC 23 | DS | Pass@180.79 | 14 | 1mo ago | |
| Mathematics tasks | GPT-5 | Score97.9 | 14 | 1mo ago |