| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| GPQA | M2CL | Accuracy95.1 | 243 | 1mo ago | |
| GPQA | GSPO + NSR | Accuracy (GPQA)59.22 | 72 | 5d ago | |
| ARC-C | Qwen3-Next-80B + THINKBRAKE | Accuracy97 | 58 | 12d ago | |
| GPQA Diamond | Accuracy83 | 56 | 1mo ago | ||
| GPQA (test) | InjectRLOpt | Accuracy64.44 | 56 | 20d ago | |
| GPQA-D | Qwen3-Next-80B | Accuracy76.3 | 52 | 8d ago | |
| GPQA | SwiR | Pass@170.2 | 50 | 1mo ago | |
| GPQA Diamond | Pass@191.9 | 48 | 1mo ago | ||
| GPQA Diam | Policy Split | Accuracy58.9 | 27 | 1mo ago | |
| GPQA | Local Highest | GPQA Score69.4 | 27 | 1mo ago | |
| GPQA diamond | SC | Accuracy72.5 | 25 | 19d ago | |
| GPQA | Phi | r* Accuracy30.9 | 24 | 27d ago | |
| GPQA | Score81.4 | 17 | 3mo ago | ||
| GPQA Diamond | GRPO + RePro | Avg@4 Accuracy44.8 | 17 | 3mo ago | |
| GPQA | DARL | Avg@439.4 | 16 | 3mo ago | |
| GPQA | DTSR | Accuracy69.2 | 15 | 1mo ago | |
| ARC Challenge (test) | Genius | Accuracy84.04 | 15 | 3mo ago | |
| GPQA Diamond | Qwen3-32B | GPQA Diamond54.6 | 13 | 16d ago | |
| GPQA-D | Pass@159.59 | 12 | 3mo ago | ||
| GPQA Diamond | Format-Adaptive-Answer | AUCOAA71.6 | 11 | 3mo ago | |
| AI2ARC | D-RPC | Accuracy92.92 | 10 | 23d ago | |
| ARC | GRPO (RLVRR) | Accuracy84.9 | 10 | 3mo ago | |
| GPQA Diamond | BCA81.2 | 9 | 21d ago | ||
| GPQA 5-shot | Accuracy33.3 | 9 | 21d ago | ||
| GPQA | FRS Score62.9 | 9 | 1mo ago |