| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Zero/Few-shot Language Modeling | Standard Downstream Tasks (arc-c, arc-e, boolq, hellaswag, piqa, siqa, winogrande) | ARC-C Accuracy70.65 | 55 | |
| Zero-shot Reasoning and Question Answering | Standard Downstream Tasks PIQA, HellaSwag, Winogrande, ARC-Challenge, ARC-Easy | PIQA Zero-Shot Accuracy77.47 | 9 | |
| Language Understanding | Standard Downstream Tasks (ARC, COPA, BoolQ, PIQA, StoryCloze, RTE, MMLU) | ARC (Challenge)49.57 | 8 |