| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy99.21 | 1,896 | |
| Commonsense Reasoning | HellaSwag | HellaSwag Accuracy87.4 | 711 | |
| Sentence Completion | HellaSwag | Accuracy87.5 | 364 | |
| Common sense reasoning | Hellaswag | Accuracy93.87 | 213 | |
| Reasoning | HellaSwag (HS) | HellaSwag Accuracy91.84 | 209 | |
| Multiple Choice Question Answering | HellaSwag | Accuracy93.59 | 196 | |
| Commonsense Inference | HellaSwag | Accuracy88.97 | 123 | |
| Commonsense Reasoning | HellaSwag | Accuracy86.63 | 97 | |
| Commonsense Reasoning | HellaSwag (HS) | HS Accuracy78.94 | 66 | |
| Commonsense Reasoning | HellaSwag | HellaSwag Score86.86 | 62 | |
| Common Sense Reasoning | HellaSwag (test) | Accuracy83.9 | 56 | |
| Commonsense Reasoning | HellaSwag | HellaSwag Score95.45 | 55 | |
| Commonsense Reasoning | HellaSwag (val) | Accuracy95.3 | 54 | |
| Commonsense Reasoning | HellaSwag | HellaSwag Score86 | 53 | |
| Zero-shot Reasoning | HellaSwag | Accuracy76.3 | 53 | |
| Common Sense Reasoning | HellaSwag | Accuracy (acc_n)95.7 | 47 | |
| Commonsense Reasoning | HellaSwag | Accuracy95.1 | 47 | |
| Commonsense Reasoning | Hellaswag | HS Score50 | 43 | |
| Zero-shot Prediction | HellaSwag | Zero-shot HellaSwag Accuracy76.36 | 43 | |
| Common Sense Reasoning | HellaSwag 0-shot | Accuracy84.4 | 38 | |
| Commonsense Reasoning | HellaSwag | Zero-shot Accuracy60.04 | 36 | |
| General Knowledge | HellaSwag | Accuracy91.7 | 36 | |
| Natural Language Understanding | HellaSwag | Accuracy85.6 | 35 | |
| Commonsense Reasoning | HellaSwag 10-shot (test) | Accuracy82.53 | 34 | |
| Multiple Choice Question Answering | HellaSwag | Normalized Accuracy78.2 | 33 |