| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Common-sense Reasoning | Common-sense Reasoning Benchmarks Zero-shot | BoolQ Accuracy80.1 | 16 | |
| Common Sense Reasoning | Six common sense reasoning benchmarks (ARC-e, PIQA, OpenbookQA, Winogrande, HellaSwag, MathQA) | Average Accuracy61 | 15 | |
| Common-sense Reasoning | Common-sense Reasoning Benchmarks (BoolQ, SciQ, PIQA, WinoG., ARC-C, HellaS.) zero-shot | BoolQ Accuracy81 | 2 |