| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Reasoning Dataset | Qwen3-30B-A3B-Thinking-2507 | Accuracy (Acc)86.9 | 21 | 2mo ago | |
| CoT-Collection | SFT-Tag | Composite Score73.7 | 20 | 3mo ago | |
| Average AIME25 AIME24 GPQA-Diamond | Accuracy74.32 | 12 | 1d ago | ||
| GPQA Diamond | Accuracy67.93 | 12 | 1d ago | ||
| AIME 24 | Accuracy81.46 | 12 | 1d ago | ||
| AIME 25 | Accuracy81.04 | 12 | 1d ago | ||
| UGPhysics AtomicPhysics | MCNIG | Accuracy15.1 | 11 | 2mo ago | |
| TumorCoT | S_FC Score64.22 | 11 | 2mo ago | ||
| EUREQA (held-out half of hard_5) | Best@321.6 | 8 | 12d ago | ||
| Driving Evaluation Benchmark | UniUGP | GPT Score0.88 | 5 | 3mo ago | |
| BBH CoT | Accuracy (BBH CoT)54.42 | 3 | 5d ago |