| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GPQA | GRPO | Accuracy68.54 | 22 | 20d ago | |
| OOD AIME, HMMT, GPQA, MMLU-Pro, MMLU-Redux 2.0 | Pass@189.5 | 8 | 5d ago | ||
| MMK12 | VL-Cogito-7B + LEAD | MMK12 Math Score65.1 | 8 | 2mo ago | |
| OlympiadBench | Qwen2.5 32B + RL (synthetic) | Mean Accuracy44.53 | 2 | 1mo ago |