| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| General Benchmarks MMLU, HellaSwag, OBQA, WinoGrande, ARC-C, PiQA, SciQ, LogiQA | RegMix | MMLU Accuracy35.68 | 70 | 4d ago | |
| Open LLM Leaderboard Population (Top-50) | Accuracy60.08 | 50 | 1mo ago | ||
| TRACE | MagMax | C-STANCE Accuracy59 | 29 | 1mo ago | |
| Huggingface Open LLM Leaderboard | LCG-MultinomialNB-6k | HellaSwag Accuracy62 | 20 | 9d ago | |
| Open LLM Leaderboard Lighteval (test) | Mean Accuracy91.07 | 17 | 1mo ago | ||
| General domain benchmarks (test) | AM-Thinking (math) | DROP Score93.3 | 16 | 1mo ago | |
| MMLU-Redux | Qwen 3 14B | Accuracy83.7 | 14 | 1mo ago | |
| LLM Evaluation Suite (ARC, CSQA, GSM8K, HS, MMLU, OBQA, PIQA, SIQA, TQA, WG) | Muon (OSP) | ARC45.9 | 14 | 1mo ago | |
| Academic Benchmarks (test) | Camelidae-8x34B-pro | Average Score59.9 | 10 | 1mo ago | |
| OpenLLM Leaderboard BBH, GPQA, IFEVAL, MMLU, MUSR (test) | BBH72.7 | 4 | 1mo ago |