| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Bool Q | ORACLE | Accuracy87.7 | 44 | 25d ago | |
| MMLU-Pro (test) | Alpaca-GPT4 + NAIT (GSM) | Accuracy24.5 | 12 | 1mo ago | |
| MMLU (test) | Alpaca-GPT4 + SelectIT | Accuracy47.9 | 12 | 1mo ago | |
| KoLA WaterBench (test) | GM37.5 | 11 | 1mo ago | ||
| Known 1000 | Disagreement Rate5.26 | 10 | 27d ago | ||
| mParaRel English (all) | Qwen 3 1.7B | Accuracy77.7 | 9 | 1mo ago | |
| MMLU-Pro | EM58.4 | 5 | 1mo ago | ||
| MMLU | EM81.04 | 5 | 1mo ago |