| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Gauntlet 20 benchmarks (test) | Prior-based | Average Normalized Accuracy9.2 | 10 | 1mo ago | |
| MMLU, ARC-C, PIQA, WinoG, GSM8K, HellaSwag, GPQA, RACE zero-shot | Average Score60.94 | 9 | 1mo ago | ||
| DCLM Pro | PathMoE | WinoGrande57.93 | 2 | 1mo ago |