| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Language Modeling and Zero-shot Multiple-Choice Reasoning | Downstream Evaluation Suite Zero-shot (FW-Edu, Wiki., LAMBADA, PIQA, HellaSwag, WinoGrande, ARC, SIQA, SciQ) (val) | FW-Edu Perplexity10.52 | 9 | |
| Language Modeling | Downstream Evaluation Suite (ARC-C, Hellaswag, PIQA, SciQ, Winograde, SocialIQA, RACE) zero-shot (test) | ARC-C Accuracy48.7 | 9 | |
| Zero-shot Downstream Task Evaluation | Downstream Evaluation Suite (Arc-e, PIQA, Hellaswag, OpenBookQA, Winogrande, MMLU, BoolQ) | Arc-e53.83 | 4 |