Share your thoughts, 1 month free Claude Pro on usSee more

Jailbreak Defense on AdvBench (AutoDAN Attack)

100DSR

SmoothLLM

Updated 4mo ago

Evaluation Results

Method	Links
SmoothLLM 2024.02		100
Paraphrasing 2024.02		100
Response check 2024.02		100
Backtranslation 2024.02		98
Backtranslation 2024.02		98
Response check 2024.02		96
Backtranslation 2024.02		96
Paraphrasing 2024.02		72
No defense 2024.02		64
SmoothLLM 2024.02		64
No defense 2024.02		40
Paraphrasing 2024.02		30
SmoothLLM 2024.02		24
Response check 2024.02		12
No defense 2024.02		4