| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-task Language Understanding | MMLU | Accuracy99.7 | 876 | |
| Language Understanding | MMLU | Accuracy96.6 | 825 | |
| Multitask Language Understanding | MMLU | Accuracy91.5 | 413 | |
| Multi-task Language Understanding | MMLU | Accuracy94.7 | 321 | |
| Multitask Language Understanding | MMLU (test) | Accuracy92.16 | 303 | |
| General Knowledge | MMLU | MMLU General Knowledge Accuracy91.2 | 234 | |
| Multiple-choice Question Answering | MMLU | Accuracy97.5 | 185 | |
| Language Understanding | MMLU (test) | MMLU Average Accuracy88 | 163 | |
| General Reasoning | MMLU | MMLU Accuracy95.1 | 156 | |
| Language Understanding | MMLU 5-shot (test) | Accuracy74.2 | 149 | |
| Knowledge | MMLU | Accuracy85.93 | 136 | |
| Language Understanding | MMLU 5-shot | Accuracy90.58 | 132 | |
| Multiple Choice Question Answering | MMLU-Pro | MMLU-Pro Overall Accuracy96.5 | 119 | |
| Multitask Language Understanding | MMLU-Pro | Accuracy87.1 | 118 | |
| Massive Multitask Language Understanding | MMLU | Accuracy69.49 | 117 | |
| General Reasoning | MMLU-Pro | Accuracy82.3 | 114 | |
| Multi-task Language Understanding | MMLU | MMLU Score86.4 | 112 | |
| Multi-task Language Understanding | MMLU | Accuracy73 | 111 | |
| Language Understanding | MMLU 0-shot | Accuracy70.46 | 110 | |
| Language Understanding | MMLU | MMLU Score73.02 | 98 | |
| Reasoning | MMLU-Pro | Accuracy92.86 | 95 | |
| Language Understanding | MMLU-Pro | Accuracy80.6 | 87 | |
| Language Understanding | MMLU | MMLU Accuracy87.56 | 77 | |
| Multi-task Language Understanding | MMLU (test) | Normalized Accuracy90.46 | 76 | |
| Multiple-choice Question Answering | MMLU 5-shot | Accuracy73.4 | 73 |