Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Understanding and Reasoning on MMLU Pro

67.8Overall Score

AIMMerging

22.66434.38246.157.818May 10, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
67.867.2-0.00772.3
2026.05
66.565.7-0.02274.3
2026.05
65.364.4-0.02871.7
2026.05
62.961.9-0.01167.9
2026.05
60.459.6-0.02868.2
2026.05
60.459.9-0.04572.6
2026.05
59.658.9-0.03368.5
2026.05
58.156.9-0.00662.7
2026.05
55.454.6-0.1162.9
2026.05
54.954.3-0.05467.2
2026.05
54.753.4-0.03463.5
2026.05
51.650.7-0.03563
2026.05
41.840.8-0.00947.5
2026.05
4038.6-0.03146.6
2026.05
38.537.5-0.03647.4
2026.05
3736-0.02947.4
2026.05
27.126-0.01135.1
2026.05
24.824-0.02328.4
2026.05
24.723.8-0.02329.6
2026.05
24.423.4-0.01634.3