Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multilingual Language UnderstandingMMMLU
CLCall76.1
30
General Knowledge EvaluationMMMLU
MMMLU General Knowledge Accuracy82.25
29
Multilingual Language UnderstandingMMMLU (Massive Multilingual Language Understanding)
Accuracy79.5
21
Multilingual Language UnderstandingMMMLU
Accuracy (Korean)60.5
20
Multilingual KnowledgeMMMLU
Accuracy87.2
18
Multitask Language UnderstandingMMMLU Swahili 1.0 (test)
Accuracy33.38
18
Multitask Language UnderstandingMMMLU Korean 1.0 (test)
Accuracy41.94
18
Multitask Language UnderstandingMMMLU non-EU languages (test)
Accuracy77.4
16
Multitask Language UnderstandingMMMLU 24 official EU languages
Overall Score80.6
14
General knowledgeMMMLU
CLCall Score76.1
10
MultilingualityMMMLU ko, de, es, ja
Average Score88.9
9
Chinese Language UnderstandingMMMLU
MMMLU Score37.08
8
Question AnsweringMMMLU
Accuracy36.14
8
Multi-task Language UnderstandingMMMLU German
Normalized Log Accuracy60.8
4
Language UnderstandingMMMLU German 5-shot (test)
Normalized Log Accuracy61.8
3
Multilingual Language UnderstandingMMMLU 5-shot
Accuracy78.94
3
Multitask Language UnderstandingMMMLU
Normalized Log Accuracy59.7
2
Multilingual Language UnderstandingMMMLU
Normalized Log Accuracy (MMMLU)78.3
2
Showing 18 of 18 rows