Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Coding on MBPP (Accuracy)

98.4Accuracy

GPT-5

38.18453.81769.4585.083Oct 10, 2025Nov 4, 2025Nov 30, 2025Dec 26, 2025Jan 20, 2026Feb 15, 2026Mar 13, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
98.4
2026.01
83.7
2026.01
83.6
2026.01
83.6
2026.01
82.6
2026.01
81.8
2026.01
79
2026.01
78.7
2026.01
77.7
2026.03
77.2
2026.03
77
2026.03
76.3
2026.03
76.3
2026.01
75.9
2026.03
75.9
2026.01
75.4
2026.03
75.1
2026.03
74.5
2026.03
73
2026.03
72.5
2026.01
72.22
2026.01
72.1
2026.03
72
2026.03
70.8
2026.01
70.37
2026.03
70.2
2026.03
70.2
2026.03
69.9
2026.03
69.6
2026.03
69.5
2026.03
69.5
2026.01
68.78
2026.01
68.7
2026.03
68.5
2026.01
68.4
2026.03
68
2026.01
67.7
2026.03
67.7
2026.03
65.9
2026.03
65.2
2026.03
64.8
2026.01
64.7
2026.01
64.2
2026.03
64
2026.03
63.1
2026.03
62.3
2026.03
61.8
2026.03
61
2026.03
60.5
2026.03
60
2026.03
59.5
2026.02
59.3
2026.03
58.5
2026.03
57.9
2026.01
57.7
2026.03
56.8
2026.03
56.4
2026.02
55.3
2026.03
55.2
2026.01
53.97
2026.02
52.8
2026.01
52.6
2026.02
51.9
2026.02
51.6
2026.03
51.1
2026.03
51.1
2026.02
51
2026.03
50.6
2026.03
50.6
2025.10
50.6
2026.03
50.3
2026.03
49.9
2026.03
49.2
2025.10
49
2026.03
48.9
2026.03
48.7
2025.10
46.7
2025.10
46.3
2025.10
46.3
2026.03
46.2
2025.10
45.9
2025.10
45.9
2025.10
45.5
2025.10
45.5
2026.03
45.1
2025.10
44.8
2025.10
44.8
2026.03
44.7
2026.03
44.7
2025.10
43.6
2025.10
43.2
2026.03
42.8
2026.03
42.8
2026.01
42.59
2025.10
42
2025.10
42
2026.03
41.2
2026.03
41.2
2026.03
40.5
2026.03
40.5
Showing 100 of 116 rows