| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | Common Sense Reasoning Tasks | Avg Score93 | 321 | |
| Common Sense Reasoning | Common Sense Reasoning Tasks (ARC-C, ARC-E, BoolQ, HellaSwag, PIQA, WinoGrande) zero-shot | Average Accuracy (Zero-Shot)77.67 | 92 | |
| Zero-shot Reasoning | 9 Common Sense Reasoning Tasks (WinoGrande, SocialIQA, LAMBADA, MMLU, ARC-Easy, ARC-Challenge, HellaSwag, OpenBookQA, PIQA) Average | Accuracy72.7 | 65 | |
| Common-sense Reasoning | 5 common-sense reasoning tasks Llama-3-8B | Average Accuracy87.07 | 27 | |
| Common-Sense Reasoning | Common-Sense Reasoning Tasks | Wiki PPL16 | 18 | |
| Common-sense Reasoning | Common-sense reasoning tasks (ARC-C, ARC-E, HellaSwag, Lambada, PIQA, WinoGrande) (test) | ARC-C Accuracy44.88 | 16 | |
| Common-sense Reasoning | 5 common-sense reasoning tasks Llama-2-70B | Average Accuracy72.41 | 15 | |
| Common-sense Reasoning | 5 common-sense reasoning tasks Llama-2-13B | Accuracy67.81 | 15 | |
| Common-sense Reasoning | 5 common-sense reasoning tasks Llama-3-70B | Average Accuracy75.33 | 9 | |
| Common-sense Reasoning | Common-sense reasoning tasks WikiText, LAMBADA, PIQA, HellaSwag, WinoGrande, ARC | PPL (WikiText)25.89 | 8 |