| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-task Language Understanding | MMLU | Accuracy99.7 | 842 | |
| Language Understanding | MMLU | Accuracy96.6 | 756 | |
| Multitask Language Understanding | MMLU (test) | Accuracy92.16 | 303 | |
| Multitask Language Understanding | MMLU | Accuracy89.8 | 206 | |
| General Knowledge | MMLU | MMLU General Knowledge Accuracy91.1 | 170 | |
| Language Understanding | MMLU 5-shot (test) | Accuracy74.2 | 149 | |
| Multiple-choice Question Answering | MMLU | Accuracy97.5 | 148 | |
| Language Understanding | MMLU (test) | MMLU Average Accuracy88 | 136 | |
| Language Understanding | MMLU 5-shot | Accuracy90.58 | 132 | |
| General Reasoning | MMLU | MMLU Accuracy95.1 | 126 | |
| Multiple Choice Question Answering | MMLU-Pro | MMLU-Pro Overall Accuracy84.8 | 116 | |
| Language Understanding | MMLU 0-shot | Accuracy70.46 | 110 | |
| Multi-task Language Understanding | MMLU | Accuracy73 | 101 | |
| Multitask Language Understanding | MMLU-Pro | Accuracy87.1 | 99 | |
| Multi-task Language Understanding | MMLU | Accuracy74.4 | 87 | |
| Multi-task Language Understanding | MMLU (test) | Normalized Accuracy90.46 | 76 | |
| Knowledge | MMLU | Accuracy85.93 | 71 | |
| Language Understanding | MMLU-Pro | Accuracy80.6 | 70 | |
| Question Answering | MMLU | Accuracy88.7 | 62 | |
| Multitask Language Understanding | MMLU (val) | Accuracy63.16 | 58 | |
| Question Answering | MMLU-Pro Natural Setting (test) | Accuracy87.8 | 56 | |
| Question Answering | MMLU-Pro | Accuracy89.1 | 56 | |
| Question Answering | MMLU | Test Error Probability0.141 | 52 | |
| General Reasoning | MMLU-Pro | MMLU-Pro General Reasoning Avg@8 Acc90.1 | 51 | |
| Reasoning | MMLU-Pro | Accuracy90.1 | 50 |