| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Factual knowledge | Global-MMLU-Lite | Seen Accuracy58.5 | 21 | |
| Multiple Choice Question Answering | Global-MMLU Medical | Accuracy (ZH)89.1 | 17 | |
| General Knowledge | Global MMLU Ukrainian (test) | Accuracy (%)67.03 | 14 | |
| Multi-task Language Understanding | Global MMLU-Lite Māori | Accuracy54.64 | 10 | |
| Multilingual Multiple-Choice Reasoning | Global MMLU 42 languages 1.0 (test) | Average Accuracy54.8 | 6 | |
| Multilingual General Knowledge | Global MMLU Lite (subset of 18 languages) | Accuracy53.73 | 6 | |
| Confidence Estimation | Global-MMLU Japanese ja (test) | AUROC74 | 5 | |
| Confidence Estimation | Global-MMLU Russian (test) | AUROC75 | 5 | |
| Confidence Estimation | Global-MMLU Spanish es (test) | AUROC74 | 5 | |
| Language Understanding | Global MMLU Overall | Accuracy59.2 | 5 | |
| Confidence Estimation | Global-MMLU Japanese | AUROC0.72 | 2 | |
| Confidence Estimation | Global-MMLU Russian | AUROC73 | 2 | |
| Confidence Estimation | Global-MMLU Polish | AUROC77 | 2 | |
| Confidence Estimation | Global-MMLU Spanish | AUROC78 | 2 | |
| Confidence Estimation | Global-MMLU English | AUROC0.75 | 2 | |
| Confidence Estimation | Global-MMLU French | AUROC0.76 | 2 | |
| Confidence Estimation | Global-MMLU all languages average | AUROC0.5 | 2 | |
| General Reasoning | Global MMLU 15 languages | Macro Accuracy54.77 | 2 | |
| Cross-lingual Reasoning and Factual Knowledge | Global MMLU (test) | Accuracy (RUS)23.46 | 2 |