| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | DictaLM 3.0 24B-Think | Accuracy85.93 | 161 | 7d ago | |
| MMLU-Pro | Qwen3.5-4B | Score79.7 | 63 | 7d ago | |
| GPQA | CoT2-Meta | Accuracy90.4 | 51 | 2mo ago | |
| MMLU 5-shot | UltraMix-190k | Knowledge (5-shot) Score74.01 | 46 | 3mo ago | |
| GPQA Diamond | Qwen3.5-9B | Accuracy (GPQA Knowledge)81.3 | 37 | 2d ago | |
| GPQA 0-shot | UltraMix-190k | Score33.03 | 37 | 3mo ago | |
| MMLU-Pro 5-shot | UltraMix-190k | Knowledge Score (5-shot)44.65 | 37 | 3mo ago | |
| GPQA | Ling-flash-2.0 | Score69.16 | 35 | 3mo ago | |
| ARC Easy | DeepSeek-R1-Distill-Qwen-32B (Reasoning) | ARC-E Score99.54 | 31 | 3mo ago | |
| ARC Challenge | ReasonAny | ARC-C Score96.43 | 31 | 3mo ago | |
| MMLU, TruthfulQA | ADG | MMLU36.1 | 30 | 1mo ago | |
| TruthfulQA 0-shot | UltraMix-187k | Accuracy63.85 | 28 | 3mo ago | |
| CMMLU | MiniCPM-4.1 | Knowledge Score84.72 | 25 | 7d ago | |
| MMB | TAIA | Accuracy61.98 | 21 | 3mo ago | |
| CommonSenseQA CoQA | Score66.91 | 20 | 3mo ago | ||
| MMLU-Pro (test) | GAC + Token-φ | Accuracy58.6 | 19 | 7d ago | |
| TruthfulQA | UM-187k | Score60.66 | 18 | 3mo ago | |
| C-Eval | DSR | C-Eval Knowledge Accuracy0.6824 | 18 | 7d ago | |
| C-EVAL | Score88.12 | 17 | 1mo ago | ||
| MMLU, NaturalQuestions, TriviaQA | MobileMoE-L | MMLU (5-shot)60.1 | 13 | 7d ago | |
| OpenBookQA (test) | Qwen3-Omni-Instruct | Accuracy92.31 | 11 | 3mo ago | |
| MMSU (test) | Qwen3-Omni-Instruct | Performance77 | 11 | 3mo ago | |
| MMLU-Redux | Qwen3-14B + NGM | Knowledge (MMLU-Redux) Score85.3 | 10 | 15d ago | |
| TriviaQA | LLaDA2.1-flash | Score72.93 | 10 | 7d ago | |
| PHYBench | LLaDA2.0-flash | Score30.06 | 10 | 3mo ago |