Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CodeEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationCodeEval-Pro BigCodeBench-Lite-Pro and HumanEval-Pro (Held-out)
Average Accuracy79.2
18
Code GenerationCodeEval-Pro BigCodeBench-Lite-Pro and HumanEval-Pro (2nd Pass)
Average Accuracy64.6
18
Code GenerationCodeEval-Pro BigCodeBench-Lite-Pro and HumanEval-Pro (1st Pass)
Average Accuracy66.7
18
Showing 3 of 3 rows