| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Zero-shot Question Answering | Downstream Reasoning Tasks ARC-c, ARC-e, BoolQ, HellaSwag, MMLU, OpenBookQA, PIQA, Winogrande | ARC-c Accuracy (Zero-shot)58.4 | 15 | |
| Zero-shot Reasoning | Downstream Reasoning Tasks (WikiText-2, ARC-e, ARC-c, BoolQ, PIQA, SIQA, HellaS., OBQA, Wino.) | WikiText-2 Acc11.78 | 6 |