| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | Commonsense Reasoning Suite BoolQ, PIQA, HellaS, WinoG, ARC-e, ARC-c, OBQA | Average Accuracy70.39 | 37 | |
| Commonsense Reasoning | Commonsense Reasoning Suite BoolQ, PIQA, HellaSwag, WinoGrande, ARC-e, ARC-c | BoolQ Accuracy83.03 | 28 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (test) | Avg Accuracy0.7418 | 22 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (PiQA, Arc-C, WinoGrande, HellaSwag, SciQ, OBQA, BoolQ, Arc-E) (test) | PiQA Accuracy80.79 | 15 | |
| Commonsense Reasoning | Commonsense Reasoning Suite (PiQA, Arc-C, WinoGrande, HellaSwag, SciQ, OBQA, BoolQ, Arc-E) | PiQA Accuracy82.21 | 15 | |
| Question Answering | Commonsense Reasoning Suite (ARC-e, ARC-c, BoolQ, OBQA, PIQA) (test) | ARC-e77.7 | 8 |