| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Common Sense Reasoning | Common Sense Reasoning Suite (PIQA, ARC-Easy, ARC-Challenge, BoolQ, HellaSwag, Winogrande) zero-shot (test dev) | PIQA79.35 | 30 | |
| Common Sense Reasoning | Common Sense Reasoning Suite (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) zero-shot | PIQA81.12 | 20 | |
| Common-sense Reasoning | Common-sense Reasoning Suite (PIQA, HellaSwag, ARC-C, ARC-E, OBQA) (test) | PIQA Accuracy84.1 | 18 | |
| Commonsense Reasoning | Common-sense Reasoning Suite Zero-shot | Average Accuracy (8 Tasks)52.92 | 10 | |
| Common Sense Reasoning | Common Sense Reasoning Suite ARC, BoolQ, RTE, Winogrande, TruthfulQA | ARC Challenge Accuracy34 | 8 | |
| Common-sense Reasoning | Common-sense Reasoning Suite | Average Score (8 tasks)42.91 | 6 | |
| Zero-shot task classification | Common Sense Reasoning Suite (PIQA, HellaSwag, WSC, BoolQ, RACE-H) zero-shot | PIQA71.22 | 5 |