| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Commonsense Reasoning Suite | BoolQ Accuracy73.18 | 32 | 1mo ago | ||
| ARC-Easy, ARC-Challenge, HellaSwag, PIQA, WinoGrande lm-evaluation-harness (test) | ARC-e Accuracy74.5 | 18 | 1mo ago | ||
| Zero-shot tasks (test) | Average Accuracy61 | 12 | 1mo ago | ||
| Reasoning Suite Zero-shot (ARC-E, BoolQ, HSwag, LAMBADA, OBQA, PIQA, SocIQA, WinoGr.) | PathMoE | ARC-E Accuracy45.5 | 9 | 1mo ago | |
| Standard Commonsense Reasoning Suite (HellaSwag, PIQA, ARC-e, ARC-c, Winogrande, BoolQ, LAMBADA) | HellaSwag Accuracy44.7 | 7 | 24d ago | ||
| CSQA | SLEB-pruned LLaMA2-7B | PIQA83.19 | 6 | 1mo ago |