| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ARC-Easy, ARC-Challenge, HellaSwag, LAMBADA, PIQA lm-eval 0.4.11 (test) | Average Accuracy81.5 | 42 | 18d ago | ||
| Evaluation Suite Zero-shot (LMB, HellA, PIQA, ARC-E, ARC-C, WINO, Open, MMLU) | ARC-E Accuracy83.4 | 25 | 1mo ago | ||
| Reasoning Suite Zero-shot (BoolQ, WinoG., PIQA, OBQA, HellaS., ARC-e, ARC-c) | SparseGPT | BoolQ Accuracy82.63 | 24 | 1mo ago | |
| LLM Evaluation Suite MMLU, GSM8k, HellaSwag, WinoGrande | MMLU72.8 | 12 | 1mo ago | ||
| Zero-shot Benchmarks | Average Zero-shot Accuracy51.47 | 9 | 1mo ago | ||
| BoolQ, ARC-e, ARC-c, WinoGrande, HellaSwag | MoEITS | ARC-e Accuracy83.08 | 8 | 5d ago |