| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Zero-shot General Evaluation | Zero-shot Task Suite (HellaSwag, MathQA, MMLU, OpenBookQA, WinoGrande, GSM8K, HumanEval) | HellaSwag Accuracy82.72 | 31 | |
| Common Sense Reasoning and Question Answering | Task Suite Zero-shot (ARC-e, ARC-c, HellaSwag, OBQA, WinoGrande, MathQA, PIQA) | ARC-e83.54 | 17 |