| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| 14-Benchmark Evaluation Suite | Qwen-2.5-7B-TuluSFT | Average Score62.05 | 72 | 1mo ago | |
| Aggregated MMLU, BoolQ, OpenBookQA, RTE | Mixtral-8x22B | Average Accuracy70.4 | 42 | 9d ago | |
| 5 Datasets Zero-shot | QuEPT | Average Accuracy72.87 | 33 | 1mo ago | |
| Open LLM Leaderboard 1 | UM-190k | Overall Score66.12 | 9 | 1mo ago | |
| 12 general benchmarks Avg | General Average Score68.24 | 3 | 9d ago | ||
| OLMo-2 Held-out Evals | OLMo-2-0425-1B | AGIEval Score24.4 | 2 | 1mo ago | |
| English lm-evaluation-harness | Transformer + Spelling Bee Embeddings | AGIEval Acc (Norm)0.259 | 2 | 1mo ago |