Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Unit test generation on LeetCode (test)
Loading...
2.77
Error Rate (%)
GPT-4o
1.3884
10.7142
20.04
29.3658
Jan 30, 2026
Error Rate (%)
Failure Rate (%)
Pass Rate (%)
Build Correctness (%)
AN Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Error Rate (%)
Failure Rate (%)
Pass Rate (%)
Build Correctness (%)
AN Score
GPT-4o
Scale=-
2026.01
2.77
25.14
72.09
83.64
5.77
CVeDRL
Scale=0.6B
2026.01
3.49
20.98
75.53
91.61
2.84
GPT-3.5
Scale=-
2026.01
3.53
37.28
59.19
76.53
5.63
CodeRM
Scale=8.0B
2026.01
3.7
58.16
38.14
75.37
6.43
LLaMA3.1
Scale=8.0B
2026.01
12.47
54.77
32.76
70.49
3.88
Qwen3
Scale=32B
2026.01
23.18
27.63
49.19
68.62
7.34
Qwen3
Scale=0.6B
2026.01
37.31
42.26
20.43
61.47
3.76
Feedback
Search any
task
Search any
task