Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code Generation on HumanEval and MBPP (Aggregated)

85.6Overall Average Score

GPT-4-1106-preview

15.08833.39451.770.006May 6, 2024Aug 26, 2024Dec 16, 2024Apr 7, 2025Jul 28, 2025Nov 17, 2025Mar 9, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2024.05
85.6-----77.5
2024.05
79.7-----70.2
2025.12
78.977.4-80.4---
2025.12
78.982.9-74.9---
2025.12
78.875.6-82---
2025.12
77.979.9-75.9---
2025.12
74.873.8-75.7---
2025.12
73.671.3-75.9---
2024.05
66.1-----58.2
2024.05
65.8-----58
2024.05
63.3-----55.3
2024.05
61.9-----53.3
2025.03
60.965.962.263.451.9--
2025.03
60.364.660.463.752.4--
2025.03
60.264.66163.951.4--
2024.07
58.759.553.865.456.1--
2024.07
58.459.152.965.755.8--
2024.07
57.657.952.464.855.3--
2024.07
56.256.350.86453.6--
2024.05
52.3-----44.7
2024.05
51.2-----43
2024.07
4433.529.361.451.6--
2024.05
43.4-----36.5
2024.05
38.7-----32.6
2024.07
32.833.82936.531.9--
2024.07
32.733.228.836.831.8--
2024.07
31.431.72835.430.4--
2024.05
31.4-----26.5
2024.07
29.528.924.635.129.6--
21.3417.07-25.6---
2026.03
20.3315.85-24.8---
2026.03
20.2316.46-24---
2026.03
19.6215.24-24---
2026.03
19.3215.24-23.4---
2024.07
18.8119.830.224.1--
2026.03
18.2113.41-23---
17.812.19-23.4---
2025.02
-7.9-16.5---
2025.02
-15.9-28.8---
2025.02
-10.4-21.3---
2025.02
-9.1-24.1---
2025.02
-20.1-42.6---
2025.02
-21.3-40.4---
2025.02
-20.6-41.3---
2025.02
-20.1-39.1---
2025.02
-20.7-39.3---