| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| tinyBenchmarks | AAS | AI2_arc Accuracy90 | 48 | 3mo ago | |
| General Capability Suite MMLU, GSM8K, HumanEval, IFEval | NOVA | Common Average Score77.78 | 39 | 1d ago | |
| General Capability Suite ARC-C, HellaSwag, MMLU, GSM8K | ARC-C Accuracy54.27 | 27 | 1d ago | ||
| General Capability Suite | TELLME | Average Score71 | 12 | 6d ago | |
| Capability Benchmarks | GCWM | Score74.32 | 10 | 22d ago | |
| General Capability Dataset | General Score66.8 | 10 | 6d ago | ||
| Voicebench | Kimi-Audio | HS Score76.91 | 8 | 2mo ago | |
| Tülu General Benchmarks 3 | MMLU45 | 6 | 2mo ago | ||
| Average (MMLU, GSM8K, MBPP) | Baseline | Accuracy78.84 | 2 | 2mo ago |