Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AdvBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackAdvBench
AASR8,712
271
Adversarial AttackAdvBench (test)
ASR100
145
Jailbreak AttackAdvBench
ASR100
133
JailbreakingAdvBench
ASR100
132
Safety EvaluationAdvbench
Safety Score100
117
Jailbreak DefenseAdvBench
ASR (PAIR)0
115
JailbreakingAdvBench selected models
ASR@10100
90
Adversarial Attack Success RateAdvBench
ASR0
75
Jailbreak AttackAdvBench (test)
ASR7,827
73
Unsafe RobustnessAdvBench
Unsafe Rate0
72
Jailbreak RobustnessAdvBench
Harmbench ASR0
72
Jailbreak AttackAdvbench-M
Attack Success Rate (ASR%)0
64
Jailbreak AttackAdvbench subset
ASR96
64
JailbreakAdvBench
Avg Queries2.1
63
JailbreakingAdvBench
BERT Score4.84
55
Safety EvaluationAdvBench
Reasoning Harmfulness Rate0
50
Jailbreak AttackAdvBench
Attack Success Rate (ASR)0
48
Jailbreak AttackAdvBench 50
ASR (KW)100
48
Safety RefusalAdvBench
Refusal Rate99.42
46
Jailbreak AttackAdvBench 150 Harmful Behaviors
ASR0
45
Harmful Request DefenseAdvBench
ASR0
44
Safety EvaluationAdvBench Safety Evaluation
ASR (S1)1.35
42
Jailbreak AttackAdvBench
ASR0
42
Jailbreak AttackAdvBench gray-box setting
ASR100
42
Jailbreak AttackAdvBench
Attack Success Rate (ASR)26.7
40
Showing 25 of 178 rows
...