| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Accuracy Tasks Zero-shot (AC, AE, WI, QA) | AC Score62 | 52 | 1mo ago | ||
| LM-Eval-Harness Suite (PIQA, HellaSwag, LAMBADA, ARC-e, ARC-c, SciQ, Race, MMLU) zero-shot | PIQA80.7 | 32 | 1mo ago | ||
| Evaluation Suite Zero-shot (ARC, LogiQA, Wino, CSQA, BoolQ, PIQA, MMLU) | Avg-DeUS | ARC83.88 | 21 | 1mo ago | |
| Downstream Suite Zero-shot (PIQA, HS, ARC, WG, RTE, OQA, BoolQ) | Meta-Llama-3-8B Dense | PIQA Accuracy80.79 | 12 | 1mo ago |