| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MATH | Qwen 3 VL 32B Instruct | MATH Accuracy95.1 | 136 | 6d ago | |
| MATH 500 | DeepSeek-R1 | Pass@197.3 | 122 | 20d ago | |
| AIME25 | gpt-oss-puzzle-88B | Accuracy93.33 | 103 | 22h ago | |
| GSM8K | JT-Safe-V2-35B | GSM8K Score94.62 | 87 | 8d ago | |
| AIME 2025 | Document RAG (Google) | Accuracy76.7 | 66 | 1mo ago | |
| AIME 2024 | Reasoning Memory | Accuracy83.8 | 60 | 2mo ago | |
| AMC | pass@187.95 | 53 | 3mo ago | ||
| OlympiadBench | COSE | Pass@1 Accuracy78.72 | 51 | 1d ago | |
| AIME 2024 | DeepSeek-R1 | Pass@10.798 | 49 | 3mo ago | |
| Minerva Math | DPO | Pass@1 Accuracy45.2 | 44 | 1d ago | |
| AIME 2024 | Accuracy65.56 | 40 | 22h ago | ||
| HMMT | EVOLIB | Accuracy77.4 | 32 | 19d ago | |
| AIME 24 | INSIGHT | Pass@153.75 | 32 | 3mo ago | |
| GSM8K 0 (test) | ESPO | Accuracy83.7 | 32 | 3mo ago | |
| Base Aggregate Math (test) | Olmo 3 | Score69.7 | 32 | 3mo ago | |
| AIME 2024 | STEP3-VL-10B | AIME 2024 Score (%)90.94 | 31 | 2mo ago | |
| MATH 500 | Draft-OPD | Throughput (tok/s)10,943 | 30 | 5d ago | |
| Beyond | Agent Q-Mix | Accuracy42 | 26 | 2mo ago | |
| AIME 26 | AutoGen | Accuracy70 | 26 | 2mo ago | |
| AIME-25, HMMT25(Feb), BeyondAIME, AMO-Bench, IMO-AnswerBench | Pass@197.5 | 25 | 2mo ago | ||
| GSM8K | COSE | Accuracy94.4 | 24 | 6d ago | |
| Minerva Math | LoRA FP16 | 4-shot Performance (%)15.52 | 21 | 8d ago | |
| AIME 25 | Avg@3280.2 | 20 | 6d ago | ||
| AIME 24 | Avg@3283.7 | 20 | 6d ago | ||
| AIME25 | Exact Match86.67 | 18 | 2mo ago |