| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| General Language Understanding | NLP Evaluation Suite (SciQ, PIQA, WG, ARC, HellaSwag, LogiQA, BoolQ, LAMBADA) | SciQ Accuracy58.3 | 14 | |
| Language Model Evaluation | NLP Evaluation Suite (WG, PIQA, BoolQ, ARC-C, ARC-E, OBQA, HS, SciQ, LM, RTE) | WG60.14 | 6 |