| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Common Sense Reasoning | Zero-shot Accuracy72.6 | 137 | 1mo ago | ||
| Zero-shot Suite (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (test) | LLaMA-2 70B | PIQA82.7 | 95 | 3mo ago | |
| Commonsense Reasoning Benchmarks (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) zero-shot | Avg Accuracy70.185 | 63 | 2mo ago | ||
| BoolQ | Gated DeltaNet-H1 | Accuracy63.21 | 12 | 1mo ago | |
| SIQA | Gated DeltaNet-H2 | Accuracy (Zero-shot)41.91 | 12 | 1mo ago | |
| Reasoning Benchmarks Zero-shot (ARC-e, ARC-c, BoolQ, PIQA, SIQA, HellaSwag, OBQA, WinoGrande) | StatQAT-iterative | ARC-e Accuracy74.2 | 8 | 15d ago |