| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Reasoning and Knowledge Suite (MMLU, ARC-C, ARC-E, BoolQ, CSQA, HSwag, PIQA, SocIQ, Wino) (various) | Qwen3-4B | MMLU75.78 | 14 | 4d ago | |
| GSM8K, Math, AIME, HumanEval, LiveCodeBench, ARC-C, ARC-E, MMLU, GPQA | Reasoning | GSM8K95.41 | 9 | 3d ago |