Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code Generation on HumanEval+ and LiveCodeBench

84.1Eval+ Score

Baseline

-2.63619.88242.464.918Oct 15, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
84.143.163.6
2025.10
844464
2025.10
83.141.662.4
2025.10
82.843.463.1
2025.10
82.741.962.3
2025.10
82.241.561.9
2025.10
81.942.962.4
2025.10
77.937.957.9
2025.10
77.738.258
2025.10
73.729.651.6
2025.10
72.225.348.7
2025.10
48.68.228.4
2025.10
11.205.6
2025.10
0.71.21