Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HumanEval+

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationHumanEval+ (test)
Pass@198.1
132
Code GenerationHumanEval+ v1 (test)
Pass Rate87.8
55
Code ReasoningHumanEval+
Pass@1697
12
Unit test generationHumanEval+ (test)
Error Rate1.27
7
Code GenerationHumanEval+
Score34.76
5
Code GenerationHumanEval+ ko
Score92.1
3
Showing 6 of 6 rows