Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Unit test generation on HumanEval+ (test)
Loading...
1.27
Error Rate
CVeDRL
0.4928
5.7389
10.985
16.2311
Jan 30, 2026
Error Rate
Failure Rate
Pass Rate
Branch Coverage
AN Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Error Rate
Failure Rate
Pass Rate
Branch Coverage
AN Score
CVeDRL
Scale=0.6B
2026.01
1.27
12.79
85.94
97.53
2.41
GPT-4o
Scale=-
2026.01
1.98
17.21
80.81
96.91
5.35
CodeRM
Scale=8.0B
2026.01
2.44
64.73
32.83
96.97
7.15
GPT-3.5
Scale=-
2026.01
3.14
26.32
70.54
96.73
4.13
Qwen3
Scale=32B
2026.01
9.43
25.31
65.26
89.53
8.17
LLaMA3.1
Scale=8.0B
2026.01
10.88
37.19
51.93
94.6
3.97
Qwen3
Scale=0.6B
2026.01
20.7
44.82
34.48
73.19
4.38
Feedback
Search any
task
Search any
task