| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMLU | DictaLM 3.0 24B-Think | Accuracy85.93 | 136 | 10d ago | |
| GPQA | CoT2-Meta | Accuracy90.4 | 51 | 18d ago | |
| MMLU-Pro | Ling-flash-2.0 | Score77.55 | 48 | 1mo ago | |
| MMLU 5-shot | UltraMix-190k | Knowledge (5-shot) Score74.01 | 46 | 1mo ago | |
| GPQA 0-shot | UltraMix-190k | Score33.03 | 37 | 1mo ago | |
| MMLU-Pro 5-shot | UltraMix-190k | Knowledge Score (5-shot)44.65 | 37 | 1mo ago | |
| GPQA | Ling-flash-2.0 | Score69.16 | 35 | 1mo ago | |
| ARC Easy | DeepSeek-R1-Distill-Qwen-32B (Reasoning) | ARC-E Score99.54 | 31 | 1mo ago | |
| ARC Challenge | ReasonAny | ARC-C Score96.43 | 31 | 1mo ago | |
| MMLU, TruthfulQA | ADG | MMLU36.1 | 30 | 4d ago | |
| TruthfulQA 0-shot | UltraMix-187k | Accuracy63.85 | 28 | 1mo ago | |
| MMB | TAIA | Accuracy61.98 | 21 | 1mo ago | |
| CommonSenseQA CoQA | Score66.91 | 20 | 1mo ago | ||
| TruthfulQA | UM-187k | Score60.66 | 18 | 1mo ago | |
| C-EVAL | Score88.12 | 17 | 4d ago | ||
| CMMLU | MiniCPM-4.1 | Knowledge Score84.72 | 16 | 4d ago | |
| OpenBookQA (test) | Qwen3-Omni-Instruct | Accuracy92.31 | 11 | 1mo ago | |
| MMSU (test) | Qwen3-Omni-Instruct | Performance77 | 11 | 1mo ago | |
| TriviaQA | LLaDA2.1-flash | Score72.93 | 10 | 1mo ago | |
| PHYBench | LLaDA2.0-flash | Score30.06 | 10 | 1mo ago | |
| GPQA | HT-MNPO | GPQA Score36.36 | 9 | 10d ago | |
| Ko-Sovereign | EM71.9 | 9 | 29d ago | ||
| KMMLU | Knowledge EM77.9 | 9 | 29d ago | ||
| GPQA | ReasonAny | GPQA Score57.5 | 9 | 1mo ago | |
| C-Eval | Qwen3-1.7B-ALLMEM | C-Eval Knowledge Accuracy0.589 | 9 | 19d ago |