Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Rule-level Identification on SecGenEval-PS CodeAnalysis
Loading...
85.2
Success Rate @1 Rule
o3-mini
-3.408
19.596
42.6
65.604
Jan 10, 2026
Success Rate @1 Rule
F1 Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate @1 Rule
F1 Score
o3-mini
Evaluation Mode=M2
2026.01
85.2
60.8
o3-mini
Evaluation Mode=M3
2026.01
84.5
68.6
GPT-4o
Evaluation Mode=M3
2026.01
80.7
47.5
GPT-4o
Evaluation Mode=M2
2026.01
78
40.8
GPT-4o
Evaluation Mode=M1
2026.01
55.2
26.7
Qwen2.5-7B
Evaluation Mode=M1
2026.01
50
0
o3-mini
Evaluation Mode=M1
2026.01
13.6
10.8
DeepSeek-R1-Distill-Qwen-7B
Evaluation Mode=M1
2026.01
13.2
4
Qwen2.5-Coder-7B
Evaluation Mode=M3
2026.01
8.3
2.1
Qwen2.5-Coder-7B
Evaluation Mode=M2
2026.01
1.5
0
Qwen2.5-Coder-7B
Evaluation Mode=M1
2026.01
0
0
Feedback
Search any
task
Search any
task