| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-task Language Understanding | MMLU | Accuracy99.7 | 881 | |
| Language Understanding | MMLU | Accuracy96.6 | 844 | |
| Multitask Language Understanding | MMLU | Accuracy91.5 | 520 | |
| Multi-task Language Understanding | MMLU | MMLU Accuracy98.5 | 442 | |
| Multi-task Language Understanding | MMLU | Accuracy94.7 | 353 | |
| Multitask Language Understanding | MMLU (test) | Accuracy92.16 | 312 | |
| General Knowledge | MMLU | MMLU General Knowledge Accuracy91.2 | 307 | |
| Multitask Language Understanding | MMLU | Accuracy86.3 | 263 | |
| Multitask Language Understanding | MMLU-Pro | Accuracy89.31 | 248 | |
| Reasoning | MMLU-Pro | Accuracy92.86 | 241 | |
| Multiple-choice Question Answering | MMLU | Accuracy97.5 | 210 | |
| General Reasoning | MMLU-Pro | Accuracy82.3 | 201 | |
| Performance Estimation | MMLU | MAE0.002 | 198 | |
| General Reasoning | MMLU | MMLU Accuracy95.1 | 180 | |
| Language Understanding | MMLU (test) | MMLU Average Accuracy88 | 167 | |
| Knowledge | MMLU | Accuracy85.93 | 161 | |
| Language Understanding | MMLU 5-shot | Accuracy90.58 | 153 | |
| Language Understanding | MMLU 5-shot (test) | Accuracy74.2 | 149 | |
| Language Understanding | MMLU | MMLU Accuracy90 | 147 | |
| Multi-task Language Understanding | MMLU | Accuracy77.6 | 136 | |
| Language Understanding | MMLU | MMLU Accuracy87.56 | 132 | |
| Multiple Choice Question Answering | MMLU-Pro | MMLU-Pro Overall Accuracy96.5 | 130 | |
| Massive Multitask Language Understanding | MMLU | Accuracy83.34 | 129 | |
| General Knowledge Evaluation | MMLU | MMLU Accuracy83.66 | 127 | |
| Knowledge Reasoning | MMLU-Pro | Accuracy91.43 | 120 |