Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Coding on MBPP (Accuracy)

98.4Accuracy

GPT-5

48.37661.36374.3587.337May 20, 2025Jul 19, 2025Sep 17, 2025Nov 17, 2025Jan 16, 2026Mar 17, 2026May 17, 2026
Updated 9d ago

Evaluation Results

MethodLinks
2026.01
98.4
2025.05
85.4
2025.05
85.2
2026.05
84.8
2026.01
83.7
2026.01
83.6
2026.01
83.6
2026.05
83.3
2026.01
82.6
2026.05
82.5
2026.01
81.8
2025.05
81.8
2025.05
81.7
2026.05
81.6
2026.05
80.2
2025.05
79.63
2025.05
79.28
2026.01
79
2026.01
78.7
2025.05
78.49
2025.05
77.86
2026.01
77.7
2026.03
77.2
2026.03
77
2026.03
76.3
2026.03
76.3
2026.01
75.9
2026.03
75.9
2026.01
75.4
2026.03
75.1
2026.03
74.5
2026.03
73
2026.05
73
2026.03
72.5
2026.01
72.22
2026.01
72.1
2026.03
72
2026.05
71.8
2026.05
71.2
2026.03
70.8
2025.05
70.8
2026.01
70.37
2026.05
70.3
2026.03
70.2
2026.03
70.2
2026.03
69.9
2026.03
69.6
2026.05
69.6
2026.03
69.5
2026.03
69.5
2025.05
68.8
2026.01
68.78
2026.01
68.7
2026.03
68.5
2026.01
68.4
2026.03
68
2026.01
67.7
2026.03
67.7
2026.05
66.9
2026.03
65.9
2026.03
65.2
2026.03
64.8
2026.01
64.7
2026.01
64.2
2026.03
64
2026.03
63.1
2026.05
62.7
2026.03
62.3
2026.05
62.2
2026.03
61.8
2026.03
61
2026.03
60.5
2026.03
60
2026.03
59.5
2026.02
59.3
2026.05
58.7
2026.03
58.5
2026.03
57.9
2026.01
57.7
2026.05
56.9
2026.03
56.8
2026.03
56.4
2026.05
55.8
2026.02
55.3
2026.03
55.2
2026.05
54.4
2026.01
53.97
2026.05
53.4
2026.02
52.8
2026.01
52.6
2026.05
52
2026.02
51.9
2026.02
51.6
2026.03
51.1
2026.03
51.1
2026.02
51
2026.03
50.6
2026.03
50.6
2025.10
50.6
2026.03
50.3
Showing 100 of 145 rows