| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Natural Questions (NQ) (Evaluation) | GRADEpre | Accuracy83 | 45 | 13d ago | |
| MMLU | MMLU Accuracy82.03 | 26 | 3d ago | ||
| KMMLU, KMMLU Redux, KMMLU Pro, CLIcK, KoBALT, MMLU Pro, GPQA Diamond | DeepSeek-V3.1 | Accuracy85.1 | 21 | 1mo ago | |
| MMLU-Redux | Brier Score0.1083 | 18 | 1mo ago | ||
| ArabicMMLU | Accuracy81.23 | 10 | 1mo ago | ||
| OALL v2 | Accuracy77.44 | 9 | 1mo ago | ||
| Winogrande (Evaluation) | Disagreement | Accuracy58 | 6 | 1mo ago | |
| WikiText (eval) | Disagreement | BPB0.777 | 6 | 1mo ago | |
| PopQA (Evaluation) | GAME-LoRA | Accuracy11.2 | 6 | 1mo ago | |
| MMLU STEM | TSD-KD | Accuracy49.7 | 5 | 1mo ago | |
| Overall Knowledge Aggregation (Aggregate) | CAD | Improvement (%)40 | 5 | 1mo ago | |
| Composite (MMLU, MMLU-Pro, CMMLU, C-EVAL, GAOKAO-Bench, ARC-c, GPQA, SciBench, PHYBench, TriviaQA) | Ling-mini-2.0 | Overall Average Score65.77 | 4 | 1mo ago |