Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Planning Security and Autonomy on WASP GitLab (test)
Loading...
29.2
Attack Success Rate
Basic Agent
-1.168
6.716
14.6
22.484
Feb 11, 2026
Attack Success Rate
HITL Load (Avg)
TCR@∞
Avg Turns
Updated 4d ago
Evaluation Results
Method
Method
Links
Attack Success Rate
HITL Load (Avg)
TCR@∞
Avg Turns
Basic Agent
Model=o1
2026.02
29.2
3.08
62.5
5.77
Basic Agent
Model=o4-mini
2026.02
25
3.06
64.6
5.58
Basic Agent
Model=GPT-4o
2026.02
20.8
2.87
64.6
5.45
Basic Agent
Model=o3-mini
2026.02
14.6
3.65
72.9
6.26
PRUDENTIA
Model=GPT-4o
2026.02
0
0
75
6.14
PRUDENTIA
Model=o1
2026.02
0
0
85.4
5.8
PRUDENTIA
Model=o3-mini
2026.02
0
0
72.9
5.6
PRUDENTIA
Model=o4-mini
2026.02
0
0
72.9
6.03
Feedback
Search any
task
Search any
task