| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GPQA | M2CL | Accuracy95.1 | 218 | 3d ago | |
| GPQA (test) | InjectRLOpt | Accuracy64.44 | 41 | 3d ago | |
| GPQA | Chain of Mindset (CoM) | Pass@169.7 | 35 | 3d ago | |
| GPQA Diamond | Pass@191.9 | 21 | 3d ago | ||
| GPQA Diamond | GRPO + RePro | Avg@4 Accuracy44.8 | 17 | 3d ago | |
| GPQA | DARL | Avg@439.4 | 16 | 3d ago | |
| ARC Challenge (test) | Genius | Accuracy84.04 | 15 | 3d ago | |
| GPQA-D | Pass@159.59 | 12 | 3d ago | ||
| GPQA Diamond | Accuracy56.06 | 11 | 3d ago | ||
| GPQA Diamond | Format-Adaptive-Answer | AUCOAA71.6 | 11 | 3d ago | |
| ARC | GRPO (RLVRR) | Accuracy84.9 | 10 | 3d ago | |
| GPQA | Score81.4 | 7 | 3d ago | ||
| Science (out-of-distribution) | CalibRL | Accuracy65.12 | 6 | 3d ago | |
| MMLU | TLMRE | Accuracy70 | 6 | 3d ago | |
| TAL-SCQ5K EN | MIG | Pass@173 | 3 | 3d ago | |
| TAL-SCQ5K CN | MIG | Pass@164 | 3 | 3d ago |