Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Language Understanding on MMLU-Redux

85.75Accuracy

Instruct Model (Q1)

4.99425.959546.92567.8905Jan 27, 2026Feb 13, 2026Mar 3, 2026Mar 21, 2026Apr 8, 2026Apr 26, 2026May 14, 2026
Updated 16d ago

Evaluation Results

MethodLinks
2026.01
85.750.11770.1071-0.7399
2026.01
85.750.11010.0612-0.7474
2026.05
85.7---
2026.02
83.92---
2026.02
83.16---
2026.01
81.120.10840.0152-0.7028
2026.05
80.6---
2026.01
80.260.16530.161-0.6373
2026.01
80.260.14980.0965-0.6528
2026.05
79.5---
2026.02
79.28---
2026.02
78.23---
2026.05
77.3---
2026.01
73.540.13790.0279-0.5976
2026.01
70.440.2320.2097-0.4724
2026.01
70.260.24750.2461-0.4552
2026.05
70.2---
2026.05
66---
2026.01
64.460.17140.019-0.4732
2026.01
54.580.37620.3742-0.1696
2026.01
54.580.35710.3485-0.1887
2026.01
53.160.20050.0277-0.331
2026.05
50---
2026.05
45.3---
2026.05
43.6---
2026.05
43.6---
2026.05
42.6---
2026.05
41---
2026.05
8.1---
2026.01
-0.23560.0546-0.3102
2026.01
-0.23650.0184-0.3093
2026.01
-0.22940.0178-0.3164
2026.01
-0.17440.0591-0.5282
2026.01
-0.18790.0389-0.5147
2026.01
-0.16960.0331-0.533
2026.01
-0.12990.0563-0.6727
2026.01
-0.14030.0287-0.6623
2026.01
-0.12390.0117-0.6788
2026.01
-0.10110.0679-0.7565
2026.01
-0.10860.0303-0.749
2026.01
-0.09860.0411-0.7589