| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Bool Q | ORACLE | Accuracy87.7 | 44 | 2mo ago | |
| Include Lite | GRPO | Seen Accuracy41.88 | 21 | 8d ago | |
| Global-MMLU-Lite | DPO | Seen Accuracy58.5 | 21 | 8d ago | |
| MMLU-Pro (test) | Alpaca-GPT4 + NAIT (GSM) | Accuracy24.5 | 12 | 2mo ago | |
| MMLU (test) | Alpaca-GPT4 + SelectIT | Accuracy47.9 | 12 | 2mo ago | |
| KoLA WaterBench (test) | GM37.5 | 11 | 3mo ago | ||
| Known 1000 | Disagreement Rate5.26 | 10 | 2mo ago | ||
| mParaRel English (all) | Qwen 3 1.7B | Accuracy77.7 | 9 | 3mo ago | |
| MMLU-Pro | EM58.4 | 5 | 3mo ago | ||
| MMLU | EM81.04 | 5 | 3mo ago |