Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Evaluation on ActorAttack
Loading...
3.5
ASR
GPT-4o
1.24
16.495
31.75
47.005
Feb 28, 2025
ASR
Updated 4d ago
Evaluation Results
Method
Method
Links
ASR
GPT-4o
NBF steering=true
2025.02
3.5
GPT-3.5-turbo
NBF steering=true
2025.02
4
Claude 3.5 Sonnet
NBF steering=true
2025.02
4
o1
NBF steering=true
2025.02
9
Claude 3.5 Sonnet
NBF steering=false
2025.02
20
o1
NBF steering=false
2025.02
51
GPT-3.5-turbo
NBF steering=false
2025.02
58.5
GPT-4o
NBF steering=false
2025.02
60
Feedback
Search any
task
Search any
task