| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AveQA | AMP | Accuracy57.7 | 25 | 1mo ago | |
| BoolQ, Winogrande, PIQA, OpenBookQA, HellaSwag, ARC-Easy, ARC-Challenge Zero-shot | BoolQ Accuracy (Zero-shot)83.76 | 21 | 1mo ago | ||
| Benchmarks (ArcC, ArcE, PiQA, Wino) Zero-shot | ARC-C Accuracy43.43 | 17 | 1mo ago | ||
| Downstream Reasoning Tasks ARC-c, ARC-e, BoolQ, HellaSwag, MMLU, OpenBookQA, PIQA, Winogrande | ARC-c Accuracy (Zero-shot)58.4 | 15 | 1mo ago | ||
| Commonsense Reasoning Suite (PIQA, WinoGrande, HellaSwag, ARC) Zero-shot Llama-2-70B | PIQA Accuracy (Zero-shot)82.7 | 7 | 3d ago | ||
| MMLU zero-shot | Pre-trained | Average Loss1.03 | 2 | 1mo ago |