Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multitask Language Understanding on MMLU-Pro

87.1Accuracy

GPT-5 (High)

17.83635.81853.871.782May 6, 2024Aug 30, 2024Dec 24, 2024Apr 19, 2025Aug 13, 2025Dec 7, 2025Apr 3, 2026
Updated 9d ago

Evaluation Results

MethodLinks
87.1
86.88
86.2
86.2
2026.02
86
85
2026.02
83.8
83.7
2026.02
83.7
83.6
80.9
2026.02
80.6
2026.02
80.6
2024.07
77
2026.01
76.1
2026.02
75.25
2024.07
74
2024.07
73.3
2026.01
72.67
2026.04
69.6
2026.04
66.76
2024.07
66.4
2026.01
66
2026.02
65.15
2024.07
64.8
2026.02
63.8
2026.01
63.69
2026.02
63.14
2026.02
63
2026.01
62.9
2026.02
62.8
2024.07
62.7
2026.01
62.5
2026.01
62.3
61.9
2026.02
61.5
2026.02
60.82
2026.02
60.4
2026.02
60.1
2026.02
59.88
2026.04
56.43
2024.07
56.3
2026.02
56.3
2026.04
56.12
2025.09
53.3
2026.01
52.01
50.4
2024.07
49.2
2025.04
49.19
48.62
2025.04
48.44
2024.07
48.3
2025.09
47.7
2026.02
44.47
43.4
42.1
2025.07
41.04
41
40.9
40.9
2025.07
40.87
40.5
2026.04
39.04
2026.04
38.96
2025.09
38.9
2025.07
38.78
2025.07
37.24
2024.07
36.9
2025.07
36.47
2026.02
35.83
2026.02
35.77
2026.01
34.91
2026.01
34.75
2025.07
34.07
33.6
2025.09
32.7
2025.12
32.1
2025.12
31.9
2025.07
31.6
2025.09
31.5
2025.02
31.3
2025.12
31
30.9
30.8
2025.02
30.5
30.42
2025.02
30.4
2025.02
29.9
2025.02
29.3
2026.01
29.26
2025.02
28.8
2025.12
28.6
2025.06
28.37
2025.07
27.14
2025.09
26.3
2025.12
24.9
2026.02
24.81
2025.09
22.8
2025.07
20.51
2025.12
20.5
Showing 100 of 118 rows