Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreak Attacks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak DefenseJailbreak Attacks
ES ASR28.1
10
Jailbreak DefenseJailbreak Attacks ReNeLLM, DeepInception, GPTFuzzer, CodeAttack (test)
Harmful Score (ReNeLLM)1
6
Attack DetectionJailbreak Attacks 105K sample set
Detection Rate68
4
Showing 3 of 3 rows