| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Zero-shot Suite (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c) (test) | LLaMA-2 70B | PIQA82.7 | 95 | 4d ago | |
| Commonsense Reasoning Benchmarks (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) zero-shot | GradMAP | Avg Accuracy48.92 | 20 | 2d ago | |
| Common Sense Reasoning | Zero-shot Accuracy66.07 | 8 | 4d ago |