| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Zero-shot Common-sense Reasoning | Commonsense Reasoning Benchmarks (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQA) zero-shot | Avg Accuracy48.92 | 20 | |
| Commonsense Reasoning | Commonsense Reasoning Benchmarks zero-shot LLaMA-2-13B | BoolQ Accuracy (Zero-shot)80.92 | 17 | |
| Commonsense Reasoning | Commonsense Reasoning Benchmarks Aggregate | Score71.9 | 12 |