Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HumanEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationHumanEval
Pass@17,927
850
Code GenerationHumanEval (test)
Pass@1100
444
Code GenerationHumanEval+
Pass@192.7
189
Code GenerationHumanEval 1.0 (test)
Pass@185.4
145
Code GenerationHumanEval
Pass@194.1
108
Code GenerationHumanEval
Accuracy (%)63.8
77
Code GenerationHumanEval-ET
Pass@189.6
75
Code GenerationHumanEval
Tokens/s287.77
61
Inference EfficiencyHumanEval
Speedup Factor5.15
54
CodingHumanEval
Pass@198.17
52
Code GenerationHumanEval Multilingual (test)
Average Score76.5
52
Code GenerationHumanEval
Accuracy81.7
51
CodeHumanEval
HumanEval Accuracy93.4
50
Code GenerationHumanEval
HumanEval Score93
50
Function-level Code GenerationHumanEval+ augmented (test)
Pass@190
46
Code GenerationHumanEval
Average Tau (τ)1.94
45
Code GenerationHumanEval @WizardCoder (test)
Pass@171.95
45
Code DebuggingHumanEval
Accuracy96.3
42
Code GenerationHumanEval
TPS222.68
41
Code ReasoningHumanEval
HumanEval Score95.73
35
Code CompletionHumanEval+
Pass@156.7
33
Code VerificationHumanEval+
Pass@187.05
32
CodingHumanEval+
Pass@195.12
31
Code GenerationHumanEval OOD
Pass@132.31
30
Code GenerationHumanEval
Functional Score M16.47
29
Showing 25 of 115 rows