Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MMLU-Pro

Benchmarks

Task NameDataset NameSOTA ResultTrend
Hallucination evaluationMMLU-Pro Law (test)
HALL%12.1
21
Academic ReasoningMMLU-Pro
Pass@150.7
15
General Knowledge ReasoningMMLU-Pro (test)
Accuracy37.72
10
LLM RoutingMMLU Pro Social Sciences (Out-of-Domain)
LPM59.2
7
LLM RoutingMMLU Pro Humanities Out-of-Domain
LPM51.74
7
General Knowledge TaskMMLU-Pro (test)
Accuracy56.3
6
General Question AnsweringMMLU-Pro (test)
Mean Accuracy79.55
4
GeneralMMLU-Pro (test)
Accuracy83.76
4
General CapabilityMMLU-Pro OpenR1-Math Harder
Accuracy71.3
3
General question answeringMMLU-Pro (test)
Optimization Token Usage595
3
GeneralMMLU-Pro (test)
Optimization Token Usage (k)778
3
Multiple-choice Question AnsweringMMLU-Pro Overall (test)
Mean Entropy (R1)0.2456
3
Language UnderstandingMMLU-Pro v1 (test)
Accuracy44.1
3
Question AnsweringMMLU-Pro Adversarial Setting (test)
Accuracy98.9
2
Scientific ReasoningMMLU-Pro
Mean@175.2
2
Showing 15 of 15 rows