| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| QA Suite (PIQA, Winogrande, HellaSwag, ARC-E, ARC-C, LAMBADA) Zero-shot | QuaRot | PIQA Accuracy78.24 | 56 | 13d ago | |
| AveQA | AMP | Accuracy57.7 | 25 | 3mo ago | |
| BoolQ, Winogrande, PIQA, OpenBookQA, HellaSwag, ARC-Easy, ARC-Challenge Zero-shot | BoolQ Accuracy (Zero-shot)83.76 | 21 | 2mo ago | ||
| Benchmarks (ArcC, ArcE, PiQA, Wino) Zero-shot | ARC-C Accuracy43.43 | 17 | 3mo ago | ||
| Downstream Reasoning Tasks ARC-c, ARC-e, BoolQ, HellaSwag, MMLU, OpenBookQA, PIQA, Winogrande | ARC-c Accuracy (Zero-shot)58.4 | 15 | 3mo ago | ||
| OpenBookQA, PIQA, RACE, SciQ, WinoGrande | Hyperloop | Accuracy54.6 | 12 | 1mo ago | |
| Commonsense Reasoning Suite (PIQA, WinoGrande, HellaSwag, ARC) Zero-shot Llama-2-70B | PIQA Accuracy (Zero-shot)82.7 | 7 | 1mo ago | ||
| OpenBookQA | Yat (sb+α) | Accuracy28.4 | 5 | 28d ago | |
| ARC Easy | GELU | Accuracy33.53 | 5 | 28d ago | |
| MMLU zero-shot | Pre-trained | Average Loss1.03 | 2 | 3mo ago |