| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | Common Sense Reasoning Tasks | Avg Score93 | 241 | |
| Common Sense Reasoning | Common Sense Reasoning Tasks (ARC-C, ARC-E, BoolQ, HellaSwag, PIQA, WinoGrande) zero-shot | Average Accuracy (Zero-Shot)74.19 | 72 | |
| Common-sense Reasoning | 5 common-sense reasoning tasks Llama-3-8B | Average Accuracy87.07 | 27 | |
| Common-sense Reasoning | Common-sense reasoning tasks (ARC-C, ARC-E, HellaSwag, Lambada, PIQA, WinoGrande) (test) | ARC-C Accuracy44.88 | 16 | |
| Common-sense Reasoning | 5 common-sense reasoning tasks Llama-2-70B | Average Accuracy72.41 | 15 | |
| Common-sense Reasoning | 5 common-sense reasoning tasks Llama-2-13B | Accuracy67.81 | 15 | |
| Common-sense Reasoning | 5 common-sense reasoning tasks Llama-3-70B | Average Accuracy75.33 | 9 |