Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DAN

Benchmarks

Task NameDataset NameSOTA ResultTrend
Backdoor AttackDAN (Do-Anything-Now)
ASRw88.07
48
Safety EvaluationDAN
Safety Score (DAN)91
26
Harmful RefusalDAN
ASR94.9
16
Jailbreak DefenseDAN
Drop in ASR42.9
6
Jailbreak DetectionDAN
Detection Rate100
4
Jailbreak RobustnessDAN Static
ASR74.8
3
Showing 6 of 6 rows