| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| 10 tasks average | Avg Accuracy70.56 | 50 | 4d ago | ||
| Open PL LLM Leaderboard instruction-tuned | Overall Average Score69.84 | 44 | 4d ago | ||
| HuggingFace Open LLM Leaderboard | SPACE | ARC65.96 | 21 | 4d ago | |
| Open LLM Leaderboard | Average Score74.2 | 19 | 4d ago | ||
| HuggingFace Open LLM Leaderboard lm-eval-harness default (various) | HellaSwag84.34 | 18 | 4d ago | ||
| 12-task evaluation suite composite (test) | FineWeb-Edu | Reading Comprehension Score49.6 | 14 | 3d ago | |
| Open LLM Leaderboard v1 (test) | Average Score69.6 | 14 | 4d ago | ||
| OpenCompass | Qwen3-30B-A3B | cMMLU84.88 | 11 | 3d ago | |
| GSM8K, TruthfulQA, CommonsenseQA, MMLU, ARC, and TriviaQA (various) | JoBS | Accuracy88 | 9 | 4d ago | |
| NorEval (test) | NorwAI-Mistral-7B | Overall Score0.455 | 8 | 4d ago | |
| MT-Bench benign prompts | Average Time Cost41.56 | 6 | 4d ago |