Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multilingual Language UnderstandingMMMLU (test)
Accuracy66.88
52
Multilingual Language UnderstandingMMMLU
CLCall76.1
30
General Knowledge EvaluationMMMLU
MMMLU General Knowledge Accuracy82.25
29
Supervised Fine-TuningMMMLU Danish (test)
MMLU Score46.24
25
Multilingual Language UnderstandingMMMLU (Massive Multilingual Language Understanding)
Accuracy79.5
21
Multilingual Language UnderstandingMMMLU
Accuracy (Korean)60.5
20
Multilingual KnowledgeMMMLU
Accuracy87.2
18
Multitask Language UnderstandingMMMLU Swahili 1.0 (test)
Accuracy33.38
18
Multitask Language UnderstandingMMMLU Korean 1.0 (test)
Accuracy41.94
18
Multitask Language UnderstandingMMMLU non-EU languages (test)
Accuracy77.4
16
Multitask Language UnderstandingMMMLU 24 official EU languages
Overall Score80.6
14
General knowledgeMMMLU
CLCall Score76.1
10
Supervised Fine-TuningMMMLU Marathi (test)
MMMLU Accuracy34.6
9
MultilingualityMMMLU ko, de, es, ja
Average Score88.9
9
Chinese Language UnderstandingMMMLU
MMMLU Score37.08
8
Question AnsweringMMMLU
Accuracy36.14
8
Multitask Language UnderstandingMMMLU (target)
RPR57.86
5
Language AdherenceMMMLU (target)
RPR99.49
5
Multitask Language UnderstandingMMMLU (target)
Accuracy55.14
5
Multi-task Language UnderstandingMMMLU German
Normalized Log Accuracy60.8
4
Language UnderstandingMMMLU German 5-shot (test)
Normalized Log Accuracy61.8
3
Multilingual Language UnderstandingMMMLU 5-shot
Accuracy78.94
3
Multitask Language UnderstandingMMMLU
Normalized Log Accuracy59.7
2
Multilingual Language UnderstandingMMMLU
Normalized Log Accuracy (MMMLU)78.3
2
Showing 24 of 24 rows