Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Training-Free Safe Denoisers for Safe Use of Diffusion Models

About

There is growing concern over the safety of powerful diffusion models (DMs), as they are often misused to produce inappropriate, not-safe-for-work (NSFW) content or generate copyrighted material or data of individuals who wish to be forgotten. Many existing methods tackle these issues by heavily relying on text-based negative prompts or extensively retraining DMs to eliminate certain features or samples. In this paper, we take a radically different approach, directly modifying the sampling trajectory by leveraging a negation set (e.g., unsafe images, copyrighted data, or datapoints needed to be excluded) to avoid specific regions of data distribution, without needing to retrain or fine-tune DMs. We formally derive the relationship between the expected denoised samples that are safe and those that are not safe, leading to our $\textit{safe}$ denoiser which ensures its final samples are away from the area to be negated. Inspired by the derivation, we develop a practical algorithm that successfully produces high-quality samples while avoiding negation areas of the data distribution in text-conditional, class-conditional, and unconditional image generation scenarios. These results hint at the great potential of our training-free safe denoiser for using DMs more safely.

Mingyu Kim, Dongjun Kim, Amman Yusuf, Stefano Ermon, Mijung Park• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationCOCO 30k
FID22.55
29
Safe Text-to-Image GenerationMMA-Diffusion
Automatic Safety Rate48.1
20
Text-to-Image GenerationUnlearnDiff
ASR52.6
7
Inappropriate Content EvaluationCoPro
Harassment15.6
6
Showing 4 of 4 rows

Other info

Follow for update