Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AdvBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackAdvBench
AASR8,712
263
Safety EvaluationAdvbench
Safety Score100
117
JailbreakingAdvBench
ASR100
114
Adversarial Attack Success RateAdvBench
ASR0
75
Jailbreak AttackAdvBench (test)
ASR7,827
73
Jailbreak AttackAdvbench-M
Attack Success Rate (ASR%)0
64
JailbreakAdvBench
Avg Queries2.1
63
JailbreakingAdvBench
BERT Score4.84
55
Jailbreak DefenseAdvBench
ASR (Overall)0
49
Jailbreak AttackAdvBench
Attack Success Rate (ASR)0
48
Jailbreak AttackAdvBench 50
ASR (KW)100
48
Safety RefusalAdvBench
Refusal Rate99.42
46
Jailbreak AttackAdvBench 150 Harmful Behaviors
ASR0
45
Harmful Request DefenseAdvBench
ASR0
44
Jailbreak AttackAdvBench
Attack Success Rate (ASR)26.7
40
Jailbreak DefenseAdvBench PAD
ASR12.12
40
Jailbreak AttackAdvBench-50 + Malicious Instruct
ASR100
40
JailbreakingAdvBench Sub
BERT Score4.73
40
Transferable Adversarial AttackAdvBench LLM Classifier (test)
TASR@19,260
39
Jailbreak AttackAdvBench GPT-3.5-turbo 1.0 (test)
Attack Success Rate97.12
38
Jailbreak AttackAdvBench
Avg.Q93.48
36
Jailbreak DefenseAdvBench PAIR attack
DSR98
35
JailbreakingAdvBench (test)
Average ASR99.04
33
Safety evaluationAdvBench 50 examples
Safe Response Rate100
32
Visual Jailbreaking AttackAdvBench
ASR43.84
32
Showing 25 of 120 rows