Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adversarial Jailbreak Attacks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak DefenseAdversarial Jailbreak Attacks Cipher, Instructional Constraint, Prefix Injection, Psychological Coercion (Alternative)
Safety Score (Cipher)100
5
Showing 1 of 1 rows