| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | WinoGrande | Accuracy7,364 | 1,085 | |
| Commonsense Reasoning | Winogrande | Accuracy85.3 | 372 | |
| Common sense reasoning | Winogrande | Accuracy91.3 | 189 | |
| Reasoning | WinoGrande (WG) | Accuracy85.2 | 135 | |
| Question Answering | Winogrande (WG) | Accuracy72.77 | 124 | |
| Commonsense Reasoning | WinoGrande (val) | Accuracy73.88 | 87 | |
| Commonsense Reasoning | Winogrande | Accuracy76.09 | 78 | |
| Commonsense Reasoning | Winogrande | Accuracy69.77 | 68 | |
| Commonsense Reasoning | WinoGrande 5-shot | Accuracy92.66 | 64 | |
| Zero-shot Reasoning | WinoGrande | Accuracy70 | 54 | |
| Commonsense Reasoning | Winogrande | Accuracy (0-shot)73.7 | 42 | |
| Pronoun Resolution | WinoGrande | Accuracy89.4 | 41 | |
| Coreference Resolution | Winogrande | Accuracy73.2 | 40 | |
| Commonsense Reasoning | WinoGrande standard (test) | Accuracy80.2 | 35 | |
| Identification of inactive attention heads | WinoGrande | Percentage of Zeroed Heads20.91 | 30 | |
| Zero-shot Accuracy | WinoGrande | Zero-shot Accuracy77.3 | 30 | |
| Commonsense Question Answering | WinoGrande (WG) (val) | Accuracy78.3 | 21 | |
| LLM Performance Estimation | Winogrande (test) | MAE1.027 | 20 | |
| Commonsense Reasoning | Winogrande | Accuracy0.8624 | 19 | |
| Commonsense Reasoning | Winogrande | Accuracy (Pre-Attack)73.2 | 19 | |
| Multiple-choice commonsense reasoning | Winogrande | Winogrande Accuracy74 | 18 | |
| Commonsense Reasoning | Winogrande | HS (Head-to-Head Score)47.75 | 17 | |
| Zero-shot Prediction | Winogrande | Accuracy69.06 | 17 | |
| Commonsense Reasoning | Winogrande | Accuracy88.8 | 16 | |
| Common Sense Reasoning | WinoGrande (dev) | Accuracy79.2 | 16 |