| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | WinoGrande | Accuracy94.1 | 776 | |
| Commonsense Reasoning | Winogrande | Accuracy85.3 | 231 | |
| Common sense reasoning | Winogrande | Accuracy91.3 | 156 | |
| Question Answering | Winogrande (WG) | Accuracy72.77 | 98 | |
| Commonsense Reasoning | WinoGrande (val) | Accuracy73.88 | 87 | |
| Reasoning | WinoGrande (WG) | Accuracy85.2 | 87 | |
| Commonsense Reasoning | Winogrande | Accuracy69.77 | 45 | |
| Commonsense Reasoning | Winogrande | Accuracy (0-shot)73.7 | 42 | |
| Commonsense Reasoning | Winogrande | Accuracy76.09 | 38 | |
| Coreference Resolution | Winogrande | Accuracy73.2 | 36 | |
| Commonsense Reasoning | WinoGrande standard (test) | Accuracy80.2 | 35 | |
| Pronoun Resolution | WinoGrande | Accuracy89.4 | 35 | |
| Zero-shot Accuracy | WinoGrande | Zero-shot Accuracy77.3 | 30 | |
| Zero-shot Reasoning | WinoGrande | Accuracy69 | 23 | |
| Commonsense Question Answering | WinoGrande (WG) (val) | Accuracy78.3 | 21 | |
| LLM Performance Estimation | Winogrande (test) | MAE1.027 | 20 | |
| Commonsense Reasoning | Winogrande | Accuracy0.8624 | 19 | |
| Commonsense Reasoning | Winogrande | Accuracy (Pre-Attack)73.2 | 19 | |
| Commonsense Reasoning | WinoGrande 5-shot | Accuracy92.66 | 18 | |
| Zero-shot Prediction | Winogrande | Accuracy69.06 | 17 | |
| Natural Language Understanding | Winogrande | Accuracy53.75 | 15 | |
| Commonsense reasoning | WinoGrande 1.0 (test) | Accuracy0.8137 | 15 | |
| Coreference Resolution | Winogrande XL | Accuracy60.5 | 13 | |
| Reasoning | Winogrande | Accuracy Improvement2.14 | 12 | |
| Commonsense Reasoning | Winogrande | LIS3.4756 | 10 |