| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Zero-shot Language Evaluation | Zero-shot NLP Evaluation Suite (WikiText2, BoolQ, PIQA, HellaSwag, WinoGrande, ARC, OBQA, MTQA) (test) | WikiText2 Perplexity7.43 | 27 | |
| General Language Understanding | NLP Evaluation Suite (SciQ, PIQA, WG, ARC, HellaSwag, LogiQA, BoolQ, LAMBADA) | SciQ Accuracy58.3 | 14 | |
| Language Model Evaluation | NLP Evaluation Suite (WG, PIQA, BoolQ, ARC-C, ARC-E, OBQA, HS, SciQ, LM, RTE) | WG60.14 | 6 |