| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Natural Questions (NQ) (Evaluation) | Accuracy59.4 | 22 | 4d ago | ||
| KMMLU, KMMLU Redux, KMMLU Pro, CLIcK, KoBALT, MMLU Pro, GPQA Diamond | DeepSeek-V3.1 | Accuracy85.1 | 21 | 4d ago | |
| MMLU-Redux | Brier Score0.1083 | 18 | 4d ago | ||
| Winogrande (Evaluation) | Disagreement | Accuracy58 | 6 | 4d ago | |
| WikiText (eval) | Disagreement | BPB0.777 | 6 | 4d ago | |
| PopQA (Evaluation) | GAME-LoRA | Accuracy11.2 | 6 | 4d ago | |
| Overall Knowledge Aggregation (Aggregate) | CAD | Improvement (%)40 | 5 | 4d ago | |
| Composite (MMLU, MMLU-Pro, CMMLU, C-EVAL, GAOKAO-Bench, ARC-c, GPQA, SciBench, PHYBench, TriviaQA) | Ling-mini-2.0 | Overall Average Score65.77 | 4 | 4d ago |