| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy99.21 | 1,891 | |
| Commonsense Reasoning | HellaSwag | HellaSwag Accuracy84 | 350 | |
| Sentence Completion | HellaSwag | Accuracy87.5 | 276 | |
| Common sense reasoning | Hellaswag | Accuracy93.87 | 213 | |
| Reasoning | HellaSwag (HS) | HellaSwag Accuracy86.31 | 162 | |
| Multiple Choice Question Answering | HellaSwag | Accuracy93.59 | 93 | |
| Commonsense Inference | HellaSwag | Accuracy88.97 | 91 | |
| Commonsense Reasoning | HellaSwag | HellaSwag Score95.45 | 55 | |
| Zero-shot Reasoning | HellaSwag | Accuracy76.3 | 48 | |
| Commonsense Reasoning | HellaSwag | Accuracy95.1 | 47 | |
| Common Sense Reasoning | HellaSwag (test) | Accuracy83.9 | 45 | |
| Commonsense Reasoning | HellaSwag | Zero-shot Accuracy60.04 | 36 | |
| Commonsense Reasoning | HellaSwag 10-shot (test) | Accuracy82.53 | 34 | |
| Common Sense Reasoning | HellaSwag 0-shot | Accuracy84.4 | 34 | |
| Commonsense Reasoning | HellaSwag | Accuracy (Baseline)82.94 | 31 | |
| Commonsense Reasoning | Hellaswag | HS Score44.5 | 28 | |
| General Knowledge | HellaSwag | Accuracy91.7 | 27 | |
| Commonsense Reasoning | HellaSwag (val) | Accuracy95.3 | 25 | |
| Multilingual Commonsense Reasoning | M-Hellaswag | Accuracy (zh)79.2 | 21 | |
| Commonsense Reasoning | HELLASWAG (test) | Accuracy95.6 | 21 | |
| LLM Performance Estimation | HellaSwag (test) | MAE (%)0.827 | 20 | |
| Common Sense Reasoning | HellaSwag (dev) | Accuracy95.4 | 20 | |
| Commonsense | HellaSwag 10-shot | Accuracy (10-shot)64.85 | 19 | |
| Zero-shot Prediction | HellaSwag | Zero-shot HellaSwag Accuracy57.14 | 17 | |
| Commonsense reasoning | HellaSwag 1.0 (test) | Accuracy85.6 | 17 |