Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-task Language Understanding on MMLU Pro (Accuracy)

96.8Accuracy

Pass@100

3.82427.96252.176.238Apr 23, 2025Jun 25, 2025Aug 27, 2025Oct 30, 2025Jan 1, 2026Mar 5, 2026May 8, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.04
96.8
2026.04
92
2026.04
91.8
2026.04
91.4
2026.04
91.4
2026.04
90.2
2026.04
87
2026.04
85.6
2026.05
81.11
2026.05
78.86
2026.05
78.63
2026.05
76.07
2026.04
74.4
2026.04
69.9
2026.04
69.8
2026.05
68.28
2026.04
67.2
2026.04
67.2
2026.04
66.5
2025.04
66.3
2026.04
66
2026.04
65.2
2025.04
63.2
2025.04
62.1
2025.09
60.22
2025.09
58.86
2026.04
56.4
2025.04
54
2025.04
51.3
2025.09
49.78
48.89
2025.04
48.6
46.67
2026.04
46.6
2025.04
45.5
2025.04
45.5
2025.09
40.24
2025.04
36.4
2025.04
33
2025.06
19.6
2026.05
19.3
2025.06
19.1
2025.06
19
2026.05
18.7
2026.05
18.5
2026.05
15.8
2026.05
15.5
2026.05
12.2
2025.06
11.2
2025.06
11
2025.06
10.7
2025.06
10.1
2025.06
9.9
2025.06
9.8
2025.06
7.9
2025.06
7.8
2025.06
7.4