| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Commonsense Reasoning | HellaSwag | Accuracy99.21 | 1,460 | |
| Common sense reasoning | Hellaswag | Accuracy93.87 | 164 | |
| Reasoning | HellaSwag (HS) | HellaSwag Accuracy86.31 | 142 | |
| Sentence Completion | HellaSwag | Accuracy87.5 | 133 | |
| Multiple Choice Question Answering | HellaSwag | Accuracy79.19 | 59 | |
| Common Sense Reasoning | HellaSwag (test) | Accuracy83.9 | 45 | |
| Commonsense Reasoning | HellaSwag | Zero-shot Accuracy60.04 | 36 | |
| Commonsense Reasoning | HellaSwag 10-shot (test) | Accuracy82.53 | 34 | |
| Commonsense Reasoning | HellaSwag | Accuracy (Baseline)82.94 | 31 | |
| Zero-shot Reasoning | HellaSwag | Accuracy76.3 | 29 | |
| Commonsense Reasoning | HellaSwag | HellaSwag Score95.45 | 27 | |
| Commonsense Reasoning | HellaSwag (val) | Accuracy95.3 | 25 | |
| Common Sense Reasoning | HellaSwag 0-shot | Accuracy84.4 | 22 | |
| Multilingual Commonsense Reasoning | M-Hellaswag | Accuracy (zh)79.2 | 21 | |
| Commonsense Reasoning | HELLASWAG (test) | Accuracy95.6 | 21 | |
| LLM Performance Estimation | HellaSwag (test) | MAE (%)0.827 | 20 | |
| Zero-shot Prediction | HellaSwag | Zero-shot HellaSwag Accuracy57.14 | 17 | |
| Commonsense reasoning | HellaSwag 1.0 (test) | Accuracy85.6 | 17 | |
| Commonsense Reasoning | Hellaswag Multilingual (test) | Accuracy83.1 | 16 | |
| Commonsense Reasoning | Hellaswag non-EU languages (test) | Accuracy80.4 | 16 | |
| Science Completion | HellaSwag | Accuracy95.2 | 16 | |
| Commonsense Reasoning | HellaSwag published (test) | Accuracy82.35 | 15 | |
| Natural Language Understanding | HellaSwag | Accuracy40.89 | 15 | |
| Sentence completion | HellaSwag (test) | Accuracy72.35 | 15 | |
| Commonsense Reasoning | Hellaswag 24 official EU languages | Accuracy84.3 | 14 |