Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MMLU

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-task Language UnderstandingMMLU
Accuracy99.7
881
Language UnderstandingMMLU
Accuracy96.6
844
Multitask Language UnderstandingMMLU
Accuracy91.5
520
Multi-task Language UnderstandingMMLU
MMLU Accuracy98.5
442
Multi-task Language UnderstandingMMLU
Accuracy94.7
353
Multitask Language UnderstandingMMLU (test)
Accuracy92.16
312
General KnowledgeMMLU
MMLU General Knowledge Accuracy91.2
307
Multitask Language UnderstandingMMLU
Accuracy86.3
263
Multitask Language UnderstandingMMLU-Pro
Accuracy89.31
248
ReasoningMMLU-Pro
Accuracy92.86
241
Multiple-choice Question AnsweringMMLU
Accuracy97.5
210
General ReasoningMMLU-Pro
Accuracy82.3
201
Performance EstimationMMLU
MAE0.002
198
General ReasoningMMLU
MMLU Accuracy95.1
180
Language UnderstandingMMLU (test)
MMLU Average Accuracy88
167
KnowledgeMMLU
Accuracy85.93
161
Language UnderstandingMMLU 5-shot
Accuracy90.58
153
Language UnderstandingMMLU 5-shot (test)
Accuracy74.2
149
Language UnderstandingMMLU
MMLU Accuracy90
147
Multi-task Language UnderstandingMMLU
Accuracy77.6
136
Language UnderstandingMMLU
MMLU Accuracy87.56
132
Multiple Choice Question AnsweringMMLU-Pro
MMLU-Pro Overall Accuracy96.5
130
Massive Multitask Language UnderstandingMMLU
Accuracy83.34
129
General Knowledge EvaluationMMLU
MMLU Accuracy83.66
127
Knowledge ReasoningMMLU-Pro
Accuracy91.43
120
Showing 25 of 859 rows
...