| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| 14-Benchmark Evaluation Suite | Qwen-2.5-7B-TuluSFT | Average Score62.05 | 72 | 3mo ago | |
| Aggregated MMLU, BoolQ, OpenBookQA, RTE | Mixtral-8x22B | Average Accuracy70.4 | 42 | 1mo ago | |
| 5 Datasets Zero-shot | QuEPT | Average Accuracy72.87 | 33 | 3mo ago | |
| English lm-evaluation-harness | OjaKV | ARC Easy Acc (Norm)0.819 | 16 | 1mo ago | |
| Aggregated Benchmarks | Qwen3-14B + NGM | Average Score0.7449 | 10 | 15d ago | |
| Open LLM Leaderboard 1 | UM-190k | Overall Score66.12 | 9 | 3mo ago | |
| 12 general benchmarks Avg | General Average Score68.24 | 3 | 1mo ago | ||
| OLMo-2 Held-out Evals | OLMo-2-0425-1B | AGIEval Score24.4 | 2 | 3mo ago |