| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Zero-shot Common-sense Reasoning | Commonsense Reasoning Benchmarks (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) zero-shot | Avg Accuracy70.185 | 63 | |
| Commonsense Reasoning | Commonsense Reasoning Benchmarks zero-shot LLaMA-2-13B | BoolQ Accuracy (Zero-shot)80.92 | 17 | |
| Commonsense Reasoning | Commonsense Reasoning Benchmarks Aggregate | Score71.9 | 12 | |
| Commonsense Reasoning | Commonsense Reasoning Benchmarks (PIQA, ARC, HS, WG, BoolQ, MMLU) zero-shot | PIQA Accuracy (Zero-shot)82.1 | 10 | |
| Commonsense Reasoning | Commonsense Reasoning Benchmarks (HellaSwag, WinoGrande, BoolQ) | HellaSwag73.6 | 5 |