| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Common Sense Reasoning | Common Sense Reasoning Suite (PIQA, ARC-Easy, ARC-Challenge, BoolQ, HellaSwag, Winogrande) zero-shot (test dev) | PIQA79.35 | 30 | |
| Common Sense Reasoning | Common Sense Reasoning Suite (PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) zero-shot | PIQA81.12 | 20 | |
| Common-sense Reasoning | Common-sense Reasoning Suite (PIQA, HellaSwag, ARC-C, ARC-E, OBQA) (test) | PIQA Accuracy84.1 | 18 | |
| Common Sense Reasoning | Common Sense Reasoning Suite ARC, BoolQ, RTE, Winogrande, TruthfulQA | ARC Challenge Accuracy34 | 8 | |
| Zero-shot task classification | Common Sense Reasoning Suite (PIQA, HellaSwag, WSC, BoolQ, RACE-H) zero-shot | PIQA71.22 | 5 |