| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | M2CL | Accuracy96.6 | 844 | 21d ago | |
| Nine Zero-Shot Tasks (BoolQ, HellaSwag, LAMBADA, OpenBookQA, PIQA, SIQA, WinoGrande, ARC-Easy, ARC-Challenge) | Average Accuracy73.81 | 173 | 1mo ago | ||
| MMLU (test) | MMLU Average Accuracy88 | 167 | 4d ago | ||
| MMLU 5-shot | ERNIE 5.0-Base | Accuracy90.58 | 153 | 8d ago | |
| MMLU 5-shot (test) | Accuracy74.2 | 149 | 1mo ago | ||
| MMLU | GD | MMLU Accuracy90 | 147 | 17h ago | |
| MMLU | Qwen3-14B | MMLU Accuracy87.56 | 132 | 7d ago | |
| MMLU 0-shot | Token Filtering | Accuracy70.46 | 119 | 1d ago | |
| MMLU-Pro | gpt-oss-120B | Accuracy80.6 | 116 | 17h ago | |
| MMLU | MMLU Score73.02 | 98 | 2mo ago | ||
| MMLU | gpt-oss-120b | MMLU Score88.6 | 70 | 1mo ago | |
| MMLU CF | CAP-CoT | Score74.5 | 66 | 1mo ago | |
| CMMLU | Qwen2-72B | Accuracy90.1 | 62 | 5d ago | |
| MMLU-Pro | FOREVER | MMLU-Pro Accuracy72.8 | 60 | 8d ago | |
| Polish Open Leaderboard | Average Performance69.84 | 53 | 1mo ago | ||
| MMLU | MASA | Average Accuracy71.91 | 50 | 3mo ago | |
| MMLU | GPT-4o-mini | Accuracy82.1 | 43 | 7d ago | |
| CEval | Qwen3-30B-A3B | Accuracy83.56 | 43 | 5d ago | |
| MMLU o=1 Exact split | ITD | Accuracy77.6 | 42 | 3mo ago | |
| MMLU | MMLU Score69 | 40 | 4d ago | ||
| WinoGrande | Accuracy80.82 | 38 | 12d ago | ||
| MMLU | Mixtral-8x22B | Humanities Avg68.6 | 33 | 3mo ago | |
| MMLU | Verify-Only | Accuracy84.9 | 31 | 3mo ago | |
| MMLU | CortexDebate | RA82.33 | 31 | 3mo ago | |
| MMLU | LMNet | Delta Accuracy37.28 | 30 | 19d ago |