Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-turn Safety Risk Assessment on PostgreSQL
Loading...
1
ASR
w/o Defense
0.0848
0.3224
0.56
0.7976
Feb 13, 2026
ASR
RR
Updated 4d ago
Evaluation Results
Method
Method
Links
ASR
RR
w/o Defense
Model=Claude-3.5-Sonnet
2026.02
1
0
Firewall
Model=Claude-3.5-Sonnet
2026.02
0.96
0.04
Baseline
Model=Gemini-1.5-Flash
2026.02
0.84
0.04
w/o Defense
Model=Gemini-1.5-Flash
2026.02
0.72
0.16
Firewall
Model=Gemini-1.5-Flash
2026.02
0.68
0.2
ToolShield
Model=Gemini-1.5-Flash
2026.02
0.6
0.4
Baseline
Model=Claude-3.5-Sonnet
2026.02
0.36
0.52
ToolShield
Model=Claude-3.5-Sonnet
2026.02
0.12
0.88
Feedback
Search any
task
Search any
task