Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HumanEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationHumanEval
Pass@17,927
1,036
Code GenerationHumanEval (test)
Pass@1100
506
Code GenerationHumanEval+
Pass@1100
383
Code GenerationHumanEval
Pass@194.1
171
Code GenerationHumanEval 1.0 (test)
Pass@185.4
145
CodingHumanEval
Pass@198.17
103
Code GenerationHumanEval
Accuracy97.56
99
Code GenerationHumanEval
HumanEval Score94.51
93
Code GenerationHumanEval-ET
Pass@189.6
92
CodingHumanEval+
Pass@195.12
83
CodeHumanEval
HumanEval Accuracy95.1
79
Code GenerationHumanEval
Accuracy (%)63.8
77
Code GenerationHumanEval
Acc98.27
65
Code GenerationHumanEval
Tokens/s287.77
61
Function-level Code GenerationHumanEval+ augmented (test)
Pass@190
57
Code GenerationHumanEval+
Pass Rate95.1
56
Code GenerationHumanEval
Tau10.72
55
Inference EfficiencyHumanEval
Speedup Factor5.15
54
Code GenerationHumanEval Multilingual (test)
Average Score76.5
52
Code GenerationHumanEval
Accuracy81.7
51
Code GenerationHumanEval
HumanEval Score93
50
Code GenerationHumanEval
Average Tau (τ)1.94
45
Code GenerationHumanEval @WizardCoder (test)
Pass@171.95
45
Code generationHumanEval
Success Rate (SR)6.06
43
Code DebuggingHumanEval
Accuracy96.3
42
Showing 25 of 181 rows
...