Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Accuracy on MMLU (Multi-task Language Understanding)

77.6Accuracy

Base

22.27236.6365165.364May 26, 2025Jul 25, 2025Sep 24, 2025Nov 23, 2025Jan 23, 2026Mar 24, 2026May 24, 2026
Updated 8d ago

Evaluation Results

MethodLinks
2026.05
77.6
2026.05
73.1
2026.02
73
2026.02
72.9
2026.02
72.9
2026.02
72.8
2026.02
72.8
2026.02
72.8
2026.02
72.7
2026.02
72.7
2026.02
72.3
2026.02
72.3
2026.02
71.9
2026.02
71.7
2026.05
71.5
2026.02
70.8
2026.02
70.6
2026.02
70.5
2026.02
70.5
2025.05
69
2026.05
68.8
2026.02
68.5
2026.05
68.5
2026.05
68.1
2025.05
68
2026.05
68
2026.05
67.8
2026.05
67.4
2026.02
67
2025.05
67
2025.05
66
2026.05
65.5
2026.02
65.2
2025.05
65
2025.05
64.5
2025.05
64
2025.05
64
2025.05
63
2026.02
62.2
2025.05
62
2026.05
59.9
2026.02
58.8
2026.05
58.5
2026.02
58
2026.02
57.8
2026.02
56.7
2026.02
56.3
2026.02
55.9
2026.02
55.7
2026.02
55.1
2026.02
54.9
2026.02
54.3
2026.02
54.3
2026.05
53.7
2026.05
53.6
2026.02
53.5
2026.02
52.5
2026.02
52.4
2026.02
52.4
2026.02
52.3
2026.02
51.7
2026.02
51.3
2026.05
50.4
2026.02
49
2026.02
48.8
2026.02
48.6
2025.09
40.9
2026.02
40.2
2026.02
39.8
2026.05
39.5
2026.05
34.8
2026.05
34.7
2026.02
32.5
2026.02
30.6
2026.05
30.4
2026.02
29.6
2025.09
28.2
2026.02
27.5
2025.09
27.2
2026.02
27.2
2026.05
27.1
2025.09
26.9
2026.02
26.8
2026.02
26.6
2026.02
26.4
2026.02
26.3
2026.02
26.3
2026.05
26.19
2026.02
26.1
2025.09
25.5
2026.05
25.02
2026.02
25
2026.02
24.9
2025.09
24.8
2026.02
24.8
2026.02
24.7
2026.02
24.6
2025.08
24.5
2025.08
24.5
2026.02
24.4
Showing 100 of 136 rows