Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Jailbreak Defense on Adversarial Jailbreak Attacks (4 Specific Vectors)
Loading...
100
Safety Score (Cipher)
Claude Opus 4.5
92.6784
94.5792
96.48
98.3808
Apr 24, 2026
Safety Score (Cipher)
Safety Score (Instructional Constraint)
Safety Score (Prefix Injection)
Safety Score (Psychological Coercion)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Safety Score (Cipher)
Safety Score (Instructional Constraint)
Safety Score (Prefix Injection)
Safety Score (Psychological Coercion)
Claude Opus 4.5
Defense Pipeline=Graph...
2026.04
100
94.37
91.55
94.37
Gemini 2.5 Flash-Lite
Defense Pipeline=Graph...
2026.04
94.37
89.44
92.25
91.55
Gemini 2.5 Flash
Defense Pipeline=Graph...
2026.04
93.66
92.96
92.96
38.03
GPT-5
Defense Pipeline=Graph...
2026.04
93.66
91.55
90.85
92.25
Gemini 2.5 Pro
Defense Pipeline=Graph...
2026.04
92.96
92.96
95.07
89.44
Feedback
Search any
task
Search any
task