Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HumanEval+

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationHumanEval+ (test)
Pass@184.76
93
Code GenerationHumanEval+ v1 (test)
Pass Rate87.8
41
Unit test generationHumanEval+ (test)
Error Rate1.27
7
Code ReasoningHumanEval+
Average Score @1682.29
6
Code GenerationHumanEval+ ko
Score92.1
3
Showing 5 of 5 rows