Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HumanEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationHumanEval
Pass@17,927
1,043
Code GenerationHumanEval (test)
Pass@1100
612
Code GenerationHumanEval+
Pass@1100
393
Code GenerationHumanEval
Accuracy97.56
217
Code GenerationHumanEval
Pass@194.1
171
CodingHumanEval
Pass@198.17
168
CodingHumanEval+
Pass@195.12
164
Code GenerationHumanEval
Speedup Factor8.22
147
Code GenerationHumanEval
pass@193.1
145
Code GenerationHumanEval 1.0 (test)
Pass@185.4
145
Code GenerationHumanEval
HumanEval Score95.22
128
CodeHumanEval
HumanEval Accuracy96.34
118
Code GenerationHumanEval
Accuracy98.27
115
Code GenerationHumanEval-ET
Pass@189.6
108
Inference EfficiencyHumanEval
Speedup Factor5.33
90
Code GenerationHumanEval
Accuracy (%)63.8
77
Code GenerationHumanEval+
Pass Rate95.1
75
Code GenerationHumanEval 0-shot
Accuracy57.93
69
Code GenerationHumanEval
Acc98.27
65
Function-level Code GenerationHumanEval+ augmented (test)
Pass@190
65
Code ReasoningHumanEval
HumanEval Score95.73
62
Code GenerationHumanEval+
Pass@186
61
Code GenerationHumanEval
Tokens/s287.77
61
CodingHumanEval
Accuracy95.62
60
Code GenerationHumanEval
Score91.56
55
Showing 25 of 317 rows
...