| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Reasoning Suite Zero-shot (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (val test) | PIQA81.77 | 119 | 4d ago | ||
| ARC-e, Winogrande, HellaSwag, PIQA | Normalized Avg Accuracy77.2 | 36 | 4d ago | ||
| PIQA | PIQA Zero-shot Accuracy80.9 | 31 | 4d ago | ||
| HellaSwag | Accuracy76.3 | 29 | 4d ago | ||
| Evaluation Suite Zero-shot (OpenbookQA, ARC-e, ARC-c, WinoGrande, HellaSwag, PIQA, MathQA) | LittleBit | Average Accuracy54.92 | 24 | 4d ago | |
| WinoGrande | Accuracy69 | 23 | 4d ago | ||
| ARC-Easy zero-shot | WeDLM-8B | Zero-shot Accuracy97.43 | 22 | 4d ago | |
| benchmark datasets (PIQA, HeSw, ARC-e, ARC-c, OBQA, Race, WSC273, LAMBADA, MMLU) Zero-shot | LLaMA-2-7B-hf | PIQA78.07 | 21 | 4d ago | |
| ZeroShot 7 | Accuracy67 | 16 | 4d ago | ||
| Zero-shot Average | Accuracy66 | 11 | 4d ago | ||
| Reasoning Suite (ARC-e, ARC-c, HellaSwag, PIQA, Winogrande) zero-shot | LLaMA-2 (FP16) | ARC-e Accuracy0.7559 | 8 | 4d ago | |
| StoryCloze zero-shot | Accuracy79.95 | 8 | 4d ago | ||
| MathQA | Accuracy28.4 | 7 | 4d ago | ||
| OpenbookQA | Accuracy44 | 7 | 4d ago |