| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning and Knowledge Question Answering | General Ability Suite (ARC, HellaSwag, PIQA, BoolQ, WinoGrande, COPA, OBQA, SciQ) various (test) | ARC-C Accuracy36.4 | 19 | |
| General Language Understanding | General Ability Suite (C-QA, T-QA, LAM, MMLU, L-Code) | Average Score48.1 | 16 | |
| Commonsense Reasoning and Knowledge Question Answering | General Ability Suite ARC, HellaSwag, PIQA, BoolQ, WinoGrande, COPA, OBQA, SciQ | ARC-C Accuracy- | 0 |