| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | GoAgent | Accuracy91.5 | 520 | 19h ago | |
| MMLU (test) | Accuracy92.16 | 312 | 1d ago | ||
| MMLU | HCP-MAD | Accuracy86.3 | 263 | 6d ago | |
| MMLU-Pro | Accuracy89.31 | 248 | 4d ago | ||
| MMLU (val) | GWT | Accuracy74.12 | 94 | 4d ago | |
| MMLU physics | A-Trust | MDR90.1 | 45 | 1mo ago | |
| MMLU Exact split, o=3 | Accuracy92.1 | 42 | 3mo ago | ||
| MMLU Pro | pass@186.7 | 38 | 21d ago | ||
| CMMLU (test) | Accuracy78.3 | 38 | 3mo ago | ||
| MMLU Professional Medecine FR (test) | Qwen3-32B | Accuracy84.19 | 35 | 1mo ago | |
| MMLU Professional Medecine EN (test) | Qwen3-32B | Accuracy85.29 | 35 | 1mo ago | |
| MMLU Medical Genetics FR (test) | MedGemma-27B-it | Accuracy86 | 35 | 1mo ago | |
| MMLU Medical Genetics EN (test) | Qwen3-32B | Accuracy (MMLU Medical Genetics)94 | 35 | 1mo ago | |
| MMLU CoT | MMLU (CoT)72.8 | 29 | 3mo ago | ||
| MMLU | Tulu 3-SFT | pass@171.9 | 24 | 3mo ago | |
| GMMLU c | HQ seed (LLM Training) | Acc (Normalized)30.75 | 22 | 1mo ago | |
| MMLU Semantic-level split, o=3 | Accuracy90.1 | 21 | 3mo ago | ||
| MMLU | vbal Score65.7 | 18 | 1mo ago | ||
| GlobalMMLU | Marco-Mini-Instruct | Accuracy73.3 | 18 | 1mo ago | |
| MMLU f_con o=5 | Accuracy (Exact)99.5 | 18 | 3mo ago | ||
| MMMLU Swahili 1.0 (test) | CLO | Accuracy33.38 | 18 | 3mo ago | |
| MMMLU Korean 1.0 (test) | CLO | Accuracy41.94 | 18 | 3mo ago | |
| BIG-bench-lite 24 tasks | PaLM 540B | Score3,777 | 17 | 3mo ago | |
| MMLU | Accuracy69.62 | 16 | 1mo ago | ||
| C-MMLU | Meta-Llama-3-8B | Accuracy (C-MMLU)51.2 | 16 | 2mo ago |