Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-task Knowledge and Reasoning on MMLU-Pro

73.85Average Score @1

STAR-1-mix

63.231665.988368.74571.5017Feb 14, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
73.85
2026.02
72.83
2026.02
72.67
2026.02
72.47
2026.02
72.26
2026.02
70.79
2026.02
70.2
2026.02
69.46
2026.02
69.45
2026.02
69.07
2026.02
68.99
2026.02
67.54
2026.02
66.51
2026.02
64.73
2026.02
64.72
2026.02
64.58
2026.02
63.64
2026.02
63.64