| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | GoAgent | Accuracy91.5 | 413 | 12d ago | |
| MMLU (test) | Accuracy92.16 | 303 | 1mo ago | ||
| MMLU-Pro | Accuracy87.1 | 118 | 9d ago | ||
| MMLU (val) | GWT | Accuracy74.12 | 72 | 18d ago | |
| MMLU physics | A-Trust | MDR90.1 | 45 | 3d ago | |
| MMLU Exact split, o=3 | Accuracy92.1 | 42 | 1mo ago | ||
| CMMLU (test) | Accuracy78.3 | 38 | 1mo ago | ||
| MMLU Professional Medecine FR (test) | Qwen3-32B | Accuracy84.19 | 35 | 9d ago | |
| MMLU Professional Medecine EN (test) | Qwen3-32B | Accuracy85.29 | 35 | 9d ago | |
| MMLU Medical Genetics FR (test) | MedGemma-27B-it | Accuracy86 | 35 | 9d ago | |
| MMLU Medical Genetics EN (test) | Qwen3-32B | Accuracy (MMLU Medical Genetics)94 | 35 | 9d ago | |
| MMLU | HCP-MAD | Accuracy86.3 | 34 | 3d ago | |
| MMLU CoT | MMLU (CoT)72.8 | 29 | 1mo ago | ||
| MMLU | Tulu 3-SFT | pass@171.9 | 24 | 1mo ago | |
| MMLU Semantic-level split, o=3 | Accuracy90.1 | 21 | 1mo ago | ||
| MMLU | vbal Score65.7 | 18 | 11d ago | ||
| MMLU f_con o=5 | Accuracy (Exact)99.5 | 18 | 1mo ago | ||
| MMMLU Swahili 1.0 (test) | CLO | Accuracy33.38 | 18 | 1mo ago | |
| MMMLU Korean 1.0 (test) | CLO | Accuracy41.94 | 18 | 1mo ago | |
| BIG-bench-lite 24 tasks | PaLM 540B | Score3,777 | 17 | 1mo ago | |
| MMLU | Accuracy69.62 | 16 | 11d ago | ||
| C-MMLU | Meta-Llama-3-8B | Accuracy (C-MMLU)51.2 | 16 | 1mo ago | |
| MMLU-ProX non-EU languages (test) | Qwen-3-30B-A3B | Accuracy70.9 | 16 | 1mo ago | |
| MMMLU non-EU languages (test) | Qwen-3-30B-A3B | Accuracy77.4 | 16 | 1mo ago | |
| ArabicMMLU | Accuracy72.5 | 16 | 1mo ago |