| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| tinyBenchmarks | AAS | AI2_arc Accuracy90 | 48 | 1mo ago | |
| General Capability Suite MMLU, GSM8K, HumanEval, IFEval | MMLU78.21 | 12 | 16d ago | ||
| Voicebench | Kimi-Audio | HS Score76.91 | 8 | 1mo ago | |
| General Capability Dataset | Base | General Score1.501 | 6 | 24d ago | |
| Tülu General Benchmarks 3 | MMLU45 | 6 | 1mo ago | ||
| Average (MMLU, GSM8K, MBPP) | Baseline | Accuracy78.84 | 2 | 1mo ago |