| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Common Sense Reasoning | SWAG | Accuracy92.29 | 24 | |
| Commonsense Reasoning | SWAG (test) | Accuracy0.9412 | 13 | |
| Commonsense Reasoning | SWAG (dev) | Accuracy91.2 | 11 | |
| Ranking correlation with full dataset evaluation | SWAG | Kendall Correlation0.93 | 10 | |
| Commonsense Reasoning | SWAG (val) | Accuracy85.5 | 9 | |
| Commonsense Reasoning | SWAG In-Domain (test) | Accuracy83.14 | 8 | |
| Natural Language Understanding | SWAG (dev) | Accuracy92.59 | 6 | |
| Grounded Commonsense Inference | SWAG (test) | Accuracy88 | 6 | |
| Grounded Commonsense Inference | SWAG (dev) | Accuracy86.6 | 4 | |
| Multiple-Choice | Swag (test) | Accuracy80.85 | 3 | |
| Question Answering | SWAG (dev) | Accuracy0.908 | 3 |