| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering and Commonsense Reasoning | Short-context benchmarks ARC-C, ARC-E, PIQA, Winogrande, HellaSwag | ARC-C Accuracy63.48 | 45 | |
| Multiple Choice Question Answering and Reasoning | Short Context Benchmarks MMLU, SciQ, OQA, CQA, SIQA, PIQA, HellaSwag, WinoGrande, ARC-c, ARC-e | MMLU Accuracy74.25 | 10 |