| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) (test) | BoolQ Accuracy88 | 138 | |
| Commonsense Reasoning | Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaS., WinoG., ARC-e, ARC-c, OBQA) | BoolQ Accuracy74.6 | 61 | |
| Commonsense Reasoning | Commonsense Reasoning | Accuracy85 | 44 | |
| Commonsense Reasoning | Commonsense Reasoning (BoolQ, PIQA, HellaSwag, Winogrande) zero-shot | Avg Commonsense Accuracy84.9 | 34 | |
| Commonsense Reasoning | Commonsense Reasoning (PIQA, WinoG., HellaS., BoolQ, SIQA, OBQA) (test) | PIQA Accuracy89.9 | 32 | |
| Commonsense Reasoning | Commonsense Reasoning (OBQA, ARC-C, Wino, PIQA, Social, ARC-E, BoolQ, Hella) | OBQA94.8 | 24 | |
| Commonsense Reasoning | Commonsense Reasoning LLaMA2-7B | Average Accuracy79.68 | 18 | |
| Commonsense Reasoning | Commonsense Reasoning Tasks (ARC-C, ARC-E, HellaSwag, LAMBADA, PIQA, WinoGrande) | ARC-C Accuracy41.47 | 13 | |
| Commonsense Reasoning | Commonsense Reasoning (BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, OBQA) LLaMA2 7B backbone (test) | BoolQ Accuracy88.5 | 10 |