Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-task Knowledge and Reasoning on MMLU-Pro

73.85Average Score @1

STAR-1-mix

30.4341.702552.97564.2475Oct 21, 2025Nov 9, 2025Nov 28, 2025Dec 18, 2025Jan 6, 2026Jan 25, 2026Feb 14, 2026
Updated 11d ago

Evaluation Results

MethodLinks
2026.02
73.85
2026.02
72.83
2026.02
72.67
2026.02
72.47
2026.02
72.26
2026.02
70.79
2026.02
70.2
2026.02
69.46
2026.02
69.45
2026.02
69.07
2026.02
68.99
2026.02
67.54
2026.02
66.51
2026.02
64.73
2026.02
64.72
2026.02
64.58
2026.02
63.64
2026.02
63.64
2025.10
32.3
2025.10
32.2
2025.10
32.1