| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multilingual Language Understanding | MMMLU | CLCall76.1 | 30 | |
| General Knowledge Evaluation | MMMLU | MMMLU General Knowledge Accuracy82.25 | 29 | |
| Multilingual Language Understanding | MMMLU (Massive Multilingual Language Understanding) | Accuracy79.5 | 21 | |
| Multilingual Language Understanding | MMMLU | Accuracy (Korean)60.5 | 20 | |
| Multilingual Knowledge | MMMLU | Accuracy87.2 | 18 | |
| Multitask Language Understanding | MMMLU Swahili 1.0 (test) | Accuracy33.38 | 18 | |
| Multitask Language Understanding | MMMLU Korean 1.0 (test) | Accuracy41.94 | 18 | |
| Multitask Language Understanding | MMMLU non-EU languages (test) | Accuracy77.4 | 16 | |
| Multitask Language Understanding | MMMLU 24 official EU languages | Overall Score80.6 | 14 | |
| General knowledge | MMMLU | CLCall Score76.1 | 10 | |
| Multilinguality | MMMLU ko, de, es, ja | Average Score88.9 | 9 | |
| Chinese Language Understanding | MMMLU | MMMLU Score37.08 | 8 | |
| Question Answering | MMMLU | Accuracy36.14 | 8 | |
| Multi-task Language Understanding | MMMLU German | Normalized Log Accuracy60.8 | 4 | |
| Language Understanding | MMMLU German 5-shot (test) | Normalized Log Accuracy61.8 | 3 | |
| Multilingual Language Understanding | MMMLU 5-shot | Accuracy78.94 | 3 | |
| Multitask Language Understanding | MMMLU | Normalized Log Accuracy59.7 | 2 | |
| Multilingual Language Understanding | MMMLU | Normalized Log Accuracy (MMMLU)78.3 | 2 |