Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Jailbreak Attacks

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak RobustnessJailbreak Attacks
Prefill Success Rate88.8
18
Jailbreak DefenseJailbreak Attacks
GCG ASR0
18
Jailbreak DefenseJailbreak Attacks
ES ASR28.1
10
Safety Defense EvaluationJailbreak Attacks GCG, AutoDAN, PAIR, DeepInc., SAP30, SIJ
GCG Attack Score1
8
Jailbreak DefenseJailbreak Attacks ReNeLLM, DeepInception, GPTFuzzer, CodeAttack (test)
Harmful Score (ReNeLLM)1
6
Attack DetectionJailbreak Attacks 105K sample set
Detection Rate68
4
Showing 6 of 6 rows