| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GPQA | M2CL | Accuracy95.1 | 243 | 3d ago | |
| GPQA | SwiR | Pass@170.2 | 50 | 8d ago | |
| GPQA (test) | InjectRLOpt | Accuracy64.44 | 41 | 1mo ago | |
| GPQA Diamond | Accuracy83 | 34 | 25d ago | ||
| GPQA Diam | Policy Split | Accuracy58.9 | 27 | 4d ago | |
| GPQA | Local Highest | GPQA Score69.4 | 27 | 2d ago | |
| GPQA Diamond | Pass@191.9 | 21 | 1mo ago | ||
| GPQA | Score81.4 | 17 | 1mo ago | ||
| GPQA Diamond | GRPO + RePro | Avg@4 Accuracy44.8 | 17 | 1mo ago | |
| GPQA | DARL | Avg@439.4 | 16 | 1mo ago | |
| GPQA | DTSR | Accuracy69.2 | 15 | 9d ago | |
| ARC Challenge (test) | Genius | Accuracy84.04 | 15 | 1mo ago | |
| GPQA-D | Pass@159.59 | 12 | 1mo ago | ||
| GPQA Diamond | Format-Adaptive-Answer | AUCOAA71.6 | 11 | 1mo ago | |
| ARC | GRPO (RLVRR) | Accuracy84.9 | 10 | 1mo ago | |
| GPQA | FRS Score62.9 | 9 | 3d ago | ||
| MMLU v1 (test) | NEMOTRON-NANO-12B-V2 (RLP) | Accuracy79.48 | 8 | 1mo ago | |
| GPQA | STDec | TPS92.08 | 6 | 9d ago | |
| Science (out-of-distribution) | CalibRL | Accuracy65.12 | 6 | 1mo ago | |
| MMLU | TLMRE | Accuracy70 | 6 | 1mo ago | |
| GPQA-Diamond (test) | RADAR | Hypervolume0.7513 | 4 | 1mo ago | |
| TAL-SCQ5K EN | MIG | Pass@173 | 3 | 1mo ago | |
| TAL-SCQ5K CN | MIG | Pass@164 | 3 | 1mo ago |