Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Code Generation on HumanEval and MBPP (Aggregated)

85.6Overall Average Score

GPT-4-1106-preview

16.12834.16452.270.236May 6, 2024Aug 12, 2024Nov 19, 2024Feb 25, 2025Jun 4, 2025Sep 10, 2025Dec 18, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2024.05
85.6-----77.5
2024.05
79.7-----70.2
2025.12
78.977.4-80.4---
2025.12
78.982.9-74.9---
2025.12
78.875.6-82---
2025.12
77.979.9-75.9---
2025.12
74.873.8-75.7---
2025.12
73.671.3-75.9---
2024.05
66.1-----58.2
2024.05
65.8-----58
2024.05
63.3-----55.3
2024.05
61.9-----53.3
2025.03
60.965.962.263.451.9--
2025.03
60.364.660.463.752.4--
2025.03
60.264.66163.951.4--
2024.07
58.759.553.865.456.1--
2024.07
58.459.152.965.755.8--
2024.07
57.657.952.464.855.3--
2024.07
56.256.350.86453.6--
2024.05
52.3-----44.7
2024.05
51.2-----43
2024.07
4433.529.361.451.6--
2024.05
43.4-----36.5
2024.05
38.7-----32.6
2024.07
32.833.82936.531.9--
2024.07
32.733.228.836.831.8--
2024.07
31.431.72835.430.4--
2024.05
31.4-----26.5
2024.07
29.528.924.635.129.6--
2024.07
18.8119.830.224.1--
2025.02
-7.9-16.5---
2025.02
-15.9-28.8---
2025.02
-10.4-21.3---
2025.02
-9.1-24.1---
2025.02
-20.1-42.6---
2025.02
-21.3-40.4---
2025.02
-20.6-41.3---
2025.02
-20.1-39.1---
2025.02
-20.7-39.3---