| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| ARC-Easy, ARC-Challenge, HellaSwag, PIQA, WinoGrande lm-evaluation-harness (test) | ARC-e Accuracy82.87 | 43 | 21d ago | ||
| Commonsense Reasoning Suite | BoolQ Accuracy73.18 | 32 | 2mo ago | ||
| Commonsense Reasoning PIQA HellaSwag WinoGrande ARC-Easy OpenBookQA MathQA (test) | Zero-shot Accuracy59 | 21 | 16d ago | ||
| Zero-shot tasks (test) | Average Accuracy61 | 12 | 3mo ago | ||
| PIQA zero-shot | Accuracy76.93 | 9 | 28d ago | ||
| Reasoning Suite Zero-shot (ARC-E, BoolQ, HSwag, LAMBADA, OBQA, PIQA, SocIQA, WinoGr.) | PathMoE | ARC-E Accuracy45.5 | 9 | 2mo ago | |
| Standard Commonsense Reasoning Suite (HellaSwag, PIQA, ARC-e, ARC-c, Winogrande, BoolQ, LAMBADA) | HellaSwag Accuracy44.7 | 7 | 2mo ago | ||
| CSQA | SLEB-pruned LLaMA2-7B | PIQA83.19 | 6 | 3mo ago | |
| Winogrande zero-shot | Accuracy (zero-shot)67.17 | 4 | 1mo ago |