| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Zero-shot Reasoning | Reasoning Tasks (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) Zero-shot | BoolQ Accuracy (Zero-shot)82.813 | 55 | |
| Zero-shot Reasoning | Zero-Shot Reasoning Tasks (ARC-C, ARC-E, BoolQ, Hella, OBQA, PIQA, SIQA, Wino) | ARC-C Accuracy65.53 | 54 | |
| Reasoning | Reasoning Tasks Average | Average Score68.6 | 32 | |
| Zero-shot Evaluation | Reasoning tasks | Reasoning Accuracy70.7 | 7 |