| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| LLM Evaluation Suite ARC-e, ARC-c, HellaSwag, OBQA, WinoGrande, MathQA, PIQA | Average Accuracy64.01 | 19 | 4d ago | ||
| MMLU, PIQA, HellaSwag, WinoGrande, ARC-Challenge | MMLU (5s)46.45 | 13 | 4d ago | ||
| LM-Evaluation-Harness ARC-c, ARC-e, BoolQ, HellaS., MMLU, OBQA, PIQA, WG | ARC-c Accuracy58.4 | 12 | 4d ago | ||
| MMLU, BoolQ, ARC-e, PIQA, Hellaswag, OBQA, Winogrande (test) | Pre-LN + LayerNorm Scaling | MMLU28.69 | 10 | 4d ago | |
| Language Understanding Evaluation Suite (Arc-c, Arc-e, BoolQ, COPA, MMLU, OBQA, PIQA, RTE, Winogrande) Zyda2 calibration (test) | Moonlight | ARC-c58.28 | 6 | 4d ago | |
| Combined (GSM8k, MATH500, MAWPS, SVAMP, AQuA, GLUE, CSQA, OBQA) | MoSLoRA | Average Score72.94 | 5 | 2d ago |