| GSM8K | | Accuracy0.95 | | 87 | 3d ago |
| Math (test) | OPCD | Accuracy80.9 | | 36 | 3d ago |
| MATH-500 | LongCat-Flash Exp-Chat | Accuracy98.8 | | 25 | 3d ago |
| AIME24 | Primitives-based MAS | Accuracy76.7 | | 20 | 3d ago |
| OMEGA | Qwen 3 VL 32B Instruct | Accuracy44 | | 13 | 2d ago |
| MATH | CapFlow | Solve Rate59.87 | | 11 | 3d ago |
| GSM8K | CapFlow | Solve Rate94.97 | | 11 | 3d ago |
| Omni-MATH | LLaDA2.1-flash | Score54.1 | | 10 | 3d ago |
| CMATH | LLaDA2.0-flash | Score96.9 | | 10 | 3d ago |
| GSM-Plus | LLaDA2.0-flash | Score89.74 | | 10 | 3d ago |
| OlympiadBench | | Score77.59 | | 10 | 3d ago |
| AIME 2025 | LLaDA2.1-flash | Score63.33 | | 10 | 3d ago |
| GSM8K 1,000-example (test) | Qwen3-VL-2B-Instruct | PPL5.8317 | | 10 | 3d ago |
| GSM8K | ProSeCo Sampling | Pass@182.18 | | 9 | 3d ago |
| IMO-ANSWERBENCH | | Score53.8 | | 9 | 3d ago |
| Math | Global Surgery | Math Score60.8 | | 8 | 3d ago |
| AIME no tools 2025 | | Pass@187.5 | | 7 | 3d ago |
| AIME no tools 2024 | | Pass@191.4 | | 7 | 3d ago |
| APT-Bench | Qwen3 | Accuracy70.5 | | 6 | 3d ago |
| AMC-12 | Probing | Accuracy73.63 | | 6 | 3d ago |
| AIME-Extend | AdaRAS | Accuracy52.67 | | 6 | 3d ago |
| AIME 2024 (test) | ReSyn | Mean@12814 | | 5 | 3d ago |
| AIME 2025 | STEP3-VL-10B | Score (%)87.66 | | 5 | 3d ago |
| GSM8K (test) | ReSyn | Mean@491.4 | | 4 | 3d ago |
| Composite (GSM8K, MATH, OlympiadBench, AIME 2025, HARDMath2, Omni-MATH, GSM-Plus, CMATH) | Ling-mini-2.0 | GSM8K94.62 | | 4 | 3d ago |