| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Average (2WikiMultiHop, MMLU, GSM8k) (in-distribution) | CONCUR | Accuracy75.2 | 29 | 1mo ago | |
| Average Out-of-domain | DFT | Accuracy (OOD)49.57 | 24 | 1mo ago | |
| MMLU-Pro | Composition-RL | Pass@169.3 | 18 | 22d ago | |
| GPQA Diamond | Standard RLVR | Pass@155 | 15 | 2mo ago | |
| BigBench Hard | Score31.1 | 5 | 3mo ago | ||
| MMLU | DCRL | Pass@156.7 | 3 | 2mo ago |