| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | WinoGrande | Accuracy7,364 | 1,442 | |
| Commonsense Reasoning | Winogrande | Accuracy85.3 | 453 | |
| Common sense reasoning | Winogrande | Accuracy91.3 | 189 | |
| Reasoning | WinoGrande (WG) | Accuracy85.2 | 168 | |
| Question Answering | Winogrande (WG) | Accuracy74.1 | 138 | |
| Commonsense Reasoning | Winogrande | Accuracy76.09 | 103 | |
| Commonsense Reasoning | WinoGrande (val) | Accuracy73.88 | 87 | |
| Commonsense Reasoning | WinoGrande 5-shot | Accuracy92.66 | 85 | |
| Commonsense Question Answering | WinoGrande | Accuracy77.82 | 73 | |
| Commonsense Reasoning | Winogrande | Accuracy69.77 | 68 | |
| Coreference Resolution | Winogrande | Accuracy73.6 | 61 | |
| Pronoun Resolution | WinoGrande | Accuracy89.4 | 58 | |
| Zero-shot Reasoning | WinoGrande | Accuracy70 | 54 | |
| Commonsense Reasoning | Winogrande | Accuracy (0-shot)73.7 | 42 | |
| Winograd Schema Challenge | WinoGrande | Accuracy76.56 | 39 | |
| Commonsense Reasoning | WinoGrande standard (test) | Accuracy80.2 | 39 | |
| Language Understanding | WinoGrande | Accuracy80.82 | 38 | |
| Commonsense reasoning | WinoGrande 1.0 (test) | Accuracy74.1 | 31 | |
| Identification of inactive attention heads | WinoGrande | Percentage of Zeroed Heads20.91 | 30 | |
| Natural Language Understanding | Winogrande | Accuracy59 | 30 | |
| Zero-shot Accuracy | WinoGrande | Zero-shot Accuracy77.3 | 30 | |
| Commonsense Reasoning | Winogrande | Accuracy80.71 | 24 | |
| Commonsense Reasoning | Winogrande | Accuracy82.35 | 23 | |
| Commonsense Reasoning | Winogrande | Winogrande Score73.88 | 22 | |
| Commonsense Question Answering | WinoGrande (WG) (val) | Accuracy78.3 | 21 |