Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Issue-level Localization on SecGenEval-PS CodeAnalysis
Loading...
59.1
Success Rate @ Issue
o3-mini
-2.364
13.593
29.55
45.507
Jan 10, 2026
Success Rate @ Issue
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate @ Issue
F1 Score
o3-mini
Evaluation Mode=M2
2026.01
59.1
34.7
o3-mini
Evaluation Mode=M3
2026.01
56.7
37.9
GPT-4o
Evaluation Mode=M3
2026.01
25.1
6.7
GPT-4o
Evaluation Mode=M2
2026.01
23.3
6
GPT-4o
Evaluation Mode=M1
2026.01
13.6
4
o3-mini
Evaluation Mode=M1
2026.01
6.8
2.4
DeepSeek-R1-Distill-Qwen-7B
Evaluation Mode=M1
2026.01
1.3
0.9
Qwen2.5-Coder-7B
Evaluation Mode=M1
2026.01
0
0
Qwen2.5-Coder-7B
Evaluation Mode=M2
2026.01
0
0
Qwen2.5-Coder-7B
Evaluation Mode=M3
2026.01
0
0
Qwen2.5-7B
Evaluation Mode=M1
2026.01
0
0
Feedback
Search any
task
Search any
task