Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-task Language UnderstandingMMLU
Accuracy99.7
876
Language UnderstandingMMLU
Accuracy96.6
825
Multitask Language UnderstandingMMLU
Accuracy91.5
413
Multi-task Language UnderstandingMMLU
Accuracy94.7
321
Multitask Language UnderstandingMMLU (test)
Accuracy92.16
303
General KnowledgeMMLU
MMLU General Knowledge Accuracy91.2
234
Multiple-choice Question AnsweringMMLU
Accuracy97.5
185
Language UnderstandingMMLU (test)
MMLU Average Accuracy88
163
General ReasoningMMLU
MMLU Accuracy95.1
156
Language UnderstandingMMLU 5-shot (test)
Accuracy74.2
149
KnowledgeMMLU
Accuracy85.93
136
Language UnderstandingMMLU 5-shot
Accuracy90.58
132
Multiple Choice Question AnsweringMMLU-Pro
MMLU-Pro Overall Accuracy96.5
119
Multitask Language UnderstandingMMLU-Pro
Accuracy87.1
118
Massive Multitask Language UnderstandingMMLU
Accuracy69.49
117
General ReasoningMMLU-Pro
Accuracy82.3
114
Multi-task Language UnderstandingMMLU
MMLU Score86.4
112
Multi-task Language UnderstandingMMLU
Accuracy73
111
Language UnderstandingMMLU 0-shot
Accuracy70.46
110
Language UnderstandingMMLU
MMLU Score73.02
98
ReasoningMMLU-Pro
Accuracy92.86
95
Language UnderstandingMMLU-Pro
Accuracy80.6
87
Language UnderstandingMMLU
MMLU Accuracy87.56
77
Multi-task Language UnderstandingMMLU (test)
Normalized Accuracy90.46
76
Multiple-choice Question AnsweringMMLU 5-shot
Accuracy73.4
73
Showing 25 of 603 rows
...