Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Unit test generation on HumanEval+ (test)
Loading...
1.27
Error Rate
CVeDRL
0.4928
5.7389
10.985
16.2311
Jan 30, 2026
Error Rate
Failure Rate
Pass Rate
Branch Coverage
AN Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Error Rate
Failure Rate
Pass Rate
Branch Coverage
AN Score
CVeDRL
Scale=0.6B
2026.01
1.27
12.79
85.94
97.53
2.41
GPT-4o
Scale=-
2026.01
1.98
17.21
80.81
96.91
5.35
CodeRM
Scale=8.0B
2026.01
2.44
64.73
32.83
96.97
7.15
GPT-3.5
Scale=-
2026.01
3.14
26.32
70.54
96.73
4.13
Qwen3
Scale=32B
2026.01
9.43
25.31
65.26
89.53
8.17
LLaMA3.1
Scale=8.0B
2026.01
10.88
37.19
51.93
94.6
3.97
Qwen3
Scale=0.6B
2026.01
20.7
44.82
34.48
73.19
4.38
Feedback
Search any
task
Search any
task