| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LogicVista | CodePercept-32B-S1 | Accuracy70.02 | 23 | 2mo ago | |
| MathVision | CodePercept-32B-S1 | Accuracy69.96 | 23 | 2mo ago | |
| GPQA Diamond | SwiR | Accuracy70.2 | 16 | 3mo ago | |
| TheoremQA | RLPR | Avg@255.4 | 16 | 3mo ago | |
| MMLU STEM | TaH+ | Accuracy (STEM)73.7 | 15 | 13d ago | |
| Super GPQA | MTP-D | Speedup Ratio2.096 | 15 | 2mo ago | |
| GPQA-Diamond, PHYBench, BIOBench | Pass@191.9 | 15 | 2mo ago | ||
| GPQA Diamond | PAPO | Accuracy (avg@4)55 | 12 | 2mo ago | |
| GPQA Diamond | SCVC | AUROC82.1 | 10 | 2mo ago | |
| GPQA-Diamond 5-shot | ERNIE 5.0-Base | Accuracy57.3 | 10 | 3mo ago | |
| JEE Main 2026 | Qwen3-30B-A3B (Thinking) | Pass@197.26 | 8 | 5d ago | |
| JEE Main 2025 | Aryabhata 2 | Pass@187.8 | 8 | 5d ago | |
| NEET 2025 | Pass@190 | 8 | 5d ago | ||
| JEE Adv. 2025 | Pass@196.81 | 8 | 5d ago | ||
| MMLU-Redux 2.0 | Qwen3-30B-A3B (Thinking) | Pass@1 Accuracy97.77 | 8 | 5d ago | |
| MMLU-Pro | Qwen3-30B-A3B (Thinking) | Pass@1 Accuracy90.8 | 8 | 5d ago | |
| GPQA | Pass@1 Accuracy77.06 | 8 | 5d ago | ||
| TheoremQA | Bingo-A | Accuracy36.8 | 8 | 1mo ago | |
| Minerva | Training-time reweighting | Avg@3256.57 | 8 | 2mo ago | |
| GaokaoQA | AGPO | Accuracy41.1 | 7 | 13d ago | |
| GaokaoCloze | AGPO | Accuracy25.9 | 7 | 13d ago | |
| SAT | AGPO | Accuracy0.893 | 7 | 13d ago | |
| OCW | AGPO | Accuracy17.3 | 7 | 13d ago | |
| GPQA | Qwen3-8B | Score60.9 | 7 | 3mo ago | |
| AIME 2025 | Qwen3-8B | Score67.6 | 7 | 3mo ago |