| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | Commonsense Reasoning Suite (test) | HellaSwag Accuracy0.9594 | 62 | |
| Commonsense Reasoning | Commonsense Reasoning Suite | OpenBookQA Accuracy35 | 48 | |
| Commonsense Reasoning | Commonsense Reasoning Suite BoolQ, PIQA, HellaS, WinoG, ARC-e, ARC-c, OBQA | Average Accuracy71.77 | 43 | |
| Commonsense Reasoning | Commonsense Reasoning Suite BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c | BoolQ Accuracy87.49 | 43 | |
| Zero-shot Commonsense Reasoning | Commonsense Reasoning Suite | BoolQ Accuracy73.18 | 32 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (ARC-e, OBQA, SIQA, ARC-c, WinoG., PIQA) | ARC-e Accuracy88 | 24 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQ) (test) | BoolQ Accuracy63.27 | 16 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (PiQA, Arc-C, WinoGrande, HellaSwag, SciQ, OBQA, BoolQ, Arc-E) (test) | PiQA Accuracy80.79 | 15 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (PiQA, Arc-C, WinoGrande, HellaSwag, SciQ, OBQA, BoolQ, Arc-E) | PiQA Accuracy82.21 | 15 | |
| Commonsense Reasoning | Commonsense Reasoning Suite LM Eval Harness | LAMBADA51.8 | 13 | |
| Question Answering | Commonsense Reasoning Suite (ARC-e, ARC-c, BoolQ, OBQA, PIQA) (test) | ARC-e77.7 | 8 | |
| Zero-shot Question Answering | Commonsense Reasoning Suite (PIQA, WinoGrande, HellaSwag, ARC) Zero-shot Llama-2-70B | PIQA Accuracy (Zero-shot)82.7 | 7 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (Arc, Hellaswag, Obqa, Piqa, Race, Siqa, Winogrande) (test) | Arc-c26.54 | 4 |