Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code Generation on HumanEval (Accuracy, Mean, Drop)

94.05Accuracy

BF16

-3.76221.631547.02572.4185May 18, 2026
Updated 14d ago

Evaluation Results

MethodLinks
2026.05
94.0575.64-
2026.05
92.2471.864-3.78
2026.05
91.4677.89-
2026.05
91.377.950.06
2026.05
91.1974.19-
2026.05
91.0678.160.27
2026.05
90.8574.430.24
2026.05
90.2478.150.26
2026.05
90.1274.17-0.02
2026.05
89.8775.14-2.75
2026.05
89.7873.11-2.53
2026.05
88.4171.99-2.2
2026.05
87.8869.416-1.42
2026.05
86.560.49-17.4
2026.05
86.4469.97-0.87
2026.05
85.9570.84-
2026.05
74.6356.88-13.96
2026.05
31.8331.74-43.9
2026.05
9.810.14-60.7
2026.05
1.837.9-66.29
2026.05
0.981.4-74.24
2026.05
00-75.64
2026.05
00-70.84
2026.05
00-74.19