| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LLM Evaluation Suite MMLU, GSM8k, HellaSwag, WinoGrande | MMLU Score64.43 | 31 | 1mo ago | ||
| LLM Evaluation Suite ARC-e, ARC-c, HellaSwag, OBQA, WinoGrande, MathQA, PIQA | Average Accuracy64.01 | 19 | 1mo ago | ||
| MMLU, PIQA, HellaSwag, WinoGrande, ARC-Challenge | MMLU (5s)46.45 | 13 | 1mo ago | ||
| LM-Evaluation-Harness ARC-c, ARC-e, BoolQ, HellaS., MMLU, OBQA, PIQA, WG | ARC-c Accuracy58.4 | 12 | 1mo ago | ||
| MMLU, BoolQ, ARC-e, PIQA, Hellaswag, OBQA, Winogrande (test) | Pre-LN + LayerNorm Scaling | MMLU28.69 | 10 | 1mo ago | |
| Language Understanding Evaluation Suite (Arc-c, Arc-e, BoolQ, COPA, MMLU, OBQA, PIQA, RTE, Winogrande) Zyda2 calibration (test) | Moonlight | ARC-c58.28 | 6 | 1mo ago | |
| OLMES Standard | OLMo-2-0425-1B | ARC-Easy Accuracy75.9 | 5 | 1mo ago | |
| Combined (GSM8k, MATH500, MAWPS, SVAMP, AQuA, GLUE, CSQA, OBQA) | MoSLoRA | Average Score72.94 | 5 | 1mo ago | |
| Mixtral-8x7B Zero-shot Suite: Arc C, BoolQ, Lambada, PIQA, Winogrande v0.1-Instruct | Arc C Accuracy65.7 | 2 | 19d ago |