| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | Commonsense Reasoning Suite BoolQ, PIQA, HellaS, WinoG, ARC-e, ARC-c, OBQA | Average Accuracy71.77 | 81 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (test) | HellaSwag Accuracy0.9594 | 62 | |
| Commonsense Reasoning | Commonsense Reasoning Suite | OpenBookQA Accuracy35 | 48 | |
| Commonsense Reasoning | Commonsense Reasoning Suite Boolq, PIQA, SIQA, Win, OBQA, HellaSwag, ARC-E, ARC-C | BoolQ Accuracy77.5 | 44 | |
| Commonsense Reasoning | Commonsense Reasoning Suite BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c | BoolQ Accuracy87.49 | 43 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (ARC-c, BoolQ, PIQA, HellaSwag, WinoGrande) zero-shot | ARC-c Accuracy55.03 | 35 | |
| Zero-shot Commonsense Reasoning | Commonsense Reasoning Suite | BoolQ Accuracy73.18 | 32 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (ARC-e, OBQA, SIQA, ARC-c, WinoG., PIQA) | ARC-e Accuracy88 | 24 | |
| Commonsense Reasoning | Commonsense Reasoning Suite LM Eval Harness | LM Eval Score48.89 | 20 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (PIQA, SIQA, HellaSwag, ARC-E, ARC-C, WinoGrande, LAMBADA) zero-shot | PIQA Accuracy66.92 | 18 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (PIQA, HellaSwag, WinoGrande, ARC-Easy, ARC-Challenge) zero-shot LLaMA-2-7B | PIQA Accuracy78 | 17 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c, OBQ) (test) | BoolQ Accuracy63.27 | 16 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (PiQA, Arc-C, WinoGrande, HellaSwag, SciQ, OBQA, BoolQ, Arc-E) (test) | PiQA Accuracy80.79 | 15 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (PiQA, Arc-C, WinoGrande, HellaSwag, SciQ, OBQA, BoolQ, Arc-E) | PiQA Accuracy82.21 | 15 | |
| Commonsense Reasoning | Commonsense Reasoning Suite Zero-shot | PIQA Accuracy64.36 | 9 | |
| Question Answering | Commonsense Reasoning Suite (ARC-e, ARC-c, BoolQ, OBQA, PIQA) (test) | ARC-e77.7 | 8 | |
| Few-shot Commonsense Reasoning | Commonsense Reasoning Suite HellaSwag, PIQA, OBQA, COPA, WinoGrande | HellaSwag Accuracy27.3 | 7 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (HellaSwag, PIQA, OBQA, COPA, WinoGrande) | HellaSwag Accuracy32.8 | 7 | |
| Zero-shot Question Answering | Commonsense Reasoning Suite (PIQA, WinoGrande, HellaSwag, ARC) Zero-shot Llama-2-70B | PIQA Accuracy (Zero-shot)82.7 | 7 | |
| Commonsense Reasoning | Commonsense Reasoning Suite OBQA, WinoGrande, ARC-c, ARC-e, HellaSwag, SIQA, PIQA | OBQA Score29.6 | 5 | |
| Question Answering | Commonsense Reasoning Suite (test) | ARC-c48.27 | 5 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (MMLU, ARC, PIQA, HellaSwag, OpenBookQA, Winogrande) | Average Accuracy63.84 | 4 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (Arc, Hellaswag, Obqa, Piqa, Race, Siqa, Winogrande) (test) | Arc-c26.54 | 4 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (OBQA, HellaSwag, ARC-E, WSC, Winogrande, BoolQ, PIQA) | Average Accuracy49.4 | 2 |