| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| OpenScience | P-POTS+Mirror | Pass@153.8 | 16 | 12d ago | |
| GPQA | DIVER | Accuracy48.5 | 14 | 21d ago | |
| GPQA | FailFast | Speedup3.11 | 12 | 3mo ago | |
| HotpotQA (dev) | PMSR | Accuracy59.8 | 6 | 1mo ago | |
| MMLU Redux | MASRouter | Training Cost (USD)0.46 | 4 | 3mo ago |