Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Jailbreak Attack on ActorAttack (test)
Loading...
54
ASR
Claude Sonnet 4.5
2
15.5
29
42.5
Feb 28, 2025
ASR
Updated 4d ago
Evaluation Results
Method
Method
Links
ASR
Claude Sonnet 4.5
Safety Steering=Disabled
2025.02
54
GPT-5
Safety Steering=Disabled
2025.02
35.5
Claude Sonnet 4.5
Safety Steering=Enabled
2025.02
11
GPT-5
Safety Steering=Enabled
2025.02
4
Feedback
Search any
task
Search any
task