| GSM8K 0 (test) | ESPO | Accuracy83.7 | | 32 | 3d ago |
| Base Aggregate Math (test) | Olmo 3 | Score69.7 | | 32 | 3d ago |
| MATH | Qwen 3 VL 32B Instruct | MATH Accuracy95.1 | | 32 | 2d ago |
| GSM8K | Qwen2-72B-Instruct | GSM8K Score93.2 | | 21 | 3d ago |
| MATH 500 | DeepSeek-R1 | Pass@197.3 | | 20 | 3d ago |
| AIME 2025 | Qwen 3 VL 32B Instruct | Accuracy64.2 | | 19 | 3d ago |
| AIME 2024 | Qwen 3 VL 32B Instruct | Accuracy75.4 | | 19 | 3d ago |
| AIME25 | gpt-oss-puzzle-88B | Accuracy93.33 | | 16 | 3d ago |
| Mathematics tasks | GPT-5 | Score97.9 | | 14 | 3d ago |
| AIME25, AIME25-ko, HRM8K, KMO25 | GLM-4.6 | Accuracy96.6 | | 12 | 3d ago |
| MATH | Qwen2-72B-Instruct | Exact Match69 | | 12 | 3d ago |
| MATH | Dream | Accuracy39.6 | | 10 | 3d ago |
| Erdős’ minimum overlap problem | | Overlap Score38.0965 | | 10 | 3d ago |
| BeyondAIME | InternVL 3.5 | Avg@1066.56 | | 9 | 3d ago |
| HRM8K | gpt-oss-120b | Score89.5 | | 8 | 3d ago |
| College Math | NPG-Muse-8B | Accuracy47.1 | | 6 | 3d ago |
| AIME 2024 | DeepSeek-R1 | Pass@10.798 | | 6 | 3d ago |
| CNMO 2024 | Qwen3-VL Thinking | Score (%)0.7922 | | 5 | 3d ago |
| HMMT25 | STEP3-VL-10B | Score0.7818 | | 5 | 3d ago |
| AIME 2024 | STEP3-VL-10B | AIME 2024 Score (%)90.94 | | 5 | 3d ago |
| AIME 2024, OmniMath, OlympiadBench, AMC 22-24, MATH-500 | OmniMath | CBRC Score0.76 | | 5 | 3d ago |
| CNMO 2024 | DeepSeek-R1 | Pass@178.8 | | 5 | 3d ago |
| AIME 25 | | Avg@3280.2 | | 4 | 3d ago |
| AIME 24 | | Avg@3283.7 | | 4 | 3d ago |
| Ko-AIME in-house 2025 | gpt-oss-120b | Score0.9 | | 4 | 3d ago |