Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Scientific Reasoning on MMLU-Pro (pass@1, avg@10)

80.3Pass@1

o1-mini

49.41257.43165.4573.469Feb 9, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.02
80.3-
2026.02
75.6-
2026.02
74.6-
2026.02
71.9-
2026.02
71.4-
2026.02
71-
2026.02
70.4-
2026.02
70.3-
2026.02
67.5-
2026.02
67.3-
2026.02
67.2-
2026.02
66.2-
2026.02
63.5-
2026.02
62.8-
2026.02
61.2-
2026.02
58-
2026.02
50.6-