Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Unit test generation on MBPP+ (test)
Loading...
0.53
Error Rate
CVeDRL
-0.576
6.8895
14.355
21.8205
Jan 30, 2026
Error Rate
Failure Rate
Pass Rate
Basic Correctness
AN Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Error Rate
Failure Rate
Pass Rate
Basic Correctness
AN Score
CVeDRL
Scale=0.6B
2026.01
0.53
15.79
83.68
97.37
3.13
CodeRM
Scale=8.0B
2026.01
2.44
52.86
44.7
97.11
7.88
GPT-4o
Scale=-
2026.01
3.98
29.89
66.13
96.91
6.12
GPT-3.5
Scale=-
2026.01
5.14
40.15
54.71
96.65
5.97
Qwen3
Scale=32B
2026.01
11.42
31.39
57.19
92.44
7.47
LLaMA3.1
Scale=8.0B
2026.01
15.79
47.53
36.68
95.93
4.13
Qwen3
Scale=0.6B
2026.01
28.18
31.53
40.29
90.12
3.48
Feedback
Search any
task
Search any
task