Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Vulnerability Detection on PyVul (test)
Loading...
82.7
Recall
MULTIVER
2.204
23.102
44
64.898
Feb 19, 2026
Recall
Precision
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Recall
Precision
F1 Score
MULTIVER
Type=Zero-shot
2026.02
82.7
48.8
61.4
GPT-3.5
Type=Fine-tuned
2026.02
81.3
63.9
71.6
CodeQwen
Type=Fine-tuned
2026.02
75.3
60.1
66.9
CodeQwen
Type=Zero-shot
2026.02
66.9
57.4
61.8
GPT-3.5
Type=Zero-shot
2026.02
61.1
50.8
55.5
GPT-4
Type=Zero-shot
2026.02
33.3
65.8
44.3
CodeQL
Type=Rule
2026.02
10.8
-
-
Bandit
Type=Rule
2026.02
5.3
-
-
Feedback
Search any
task
Search any
task