Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

UnsafeDiff

Benchmarks

Task NameDataset NameSOTA ResultTrend
Jailbreak AttackUnsafeDiff
Attack Success Rate (ASR)1.7
12
Safety EvaluationUnsafeDiff (test)
F1 Score94.2
11
Black-box NSFW Filter AttackUnsafeDiff (test)
Adult Bypass Rate22
2
Showing 3 of 3 rows