Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Global MMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Factual knowledgeGlobal-MMLU-Lite
Seen Accuracy58.5
21
Multiple Choice Question AnsweringGlobal-MMLU Medical
Accuracy (ZH)89.1
17
General KnowledgeGlobal MMLU Ukrainian (test)
Accuracy (%)67.03
14
Multi-task Language UnderstandingGlobal MMLU-Lite Māori
Accuracy54.64
10
Multilingual Multiple-Choice ReasoningGlobal MMLU 42 languages 1.0 (test)
Average Accuracy54.8
6
Multilingual General KnowledgeGlobal MMLU Lite (subset of 18 languages)
Accuracy53.73
6
Confidence EstimationGlobal-MMLU Japanese ja (test)
AUROC74
5
Confidence EstimationGlobal-MMLU Russian (test)
AUROC75
5
Confidence EstimationGlobal-MMLU Spanish es (test)
AUROC74
5
Language UnderstandingGlobal MMLU Overall
Accuracy59.2
5
Confidence EstimationGlobal-MMLU Japanese
AUROC0.72
2
Confidence EstimationGlobal-MMLU Russian
AUROC73
2
Confidence EstimationGlobal-MMLU Polish
AUROC77
2
Confidence EstimationGlobal-MMLU Spanish
AUROC78
2
Confidence EstimationGlobal-MMLU English
AUROC0.75
2
Confidence EstimationGlobal-MMLU French
AUROC0.76
2
Confidence EstimationGlobal-MMLU all languages average
AUROC0.5
2
General ReasoningGlobal MMLU 15 languages
Macro Accuracy54.77
2
Cross-lingual Reasoning and Factual KnowledgeGlobal MMLU (test)
Accuracy (RUS)23.46
2
Showing 19 of 19 rows