| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Zero-shot Downstream Evaluation | Downstream Suite Zero-shot (ARC-E, ARC-C, HellaS., PIQA, WG, OBQA, SciQ, BoolQ) | ARC-Easy Accuracy74.82 | 26 | |
| Zero-shot Downstream Accuracy | Downstream Suite Zero-shot (BoolQ, HellaSwag, PIQA, RACE, WinoGrande) | BoolQ Accuracy82.4 | 19 | |
| Zero-shot Question Answering and Reasoning | Downstream Suite Zero-shot (PIQA, HS, ARC, WG, RTE, OQA, BoolQ) | PIQA Accuracy80.79 | 12 | |
| General Evaluation | Downstream Suite | Average Score39.38 | 8 | |
| Downstream Task Evaluation | Downstream Suite (BoolQ, PIQA, HS, WG, ARC-e, ARC-c, OBQA) Zero-shot | Accuracy (BoolQ)77.7 | 5 |