Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them
About
Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models by selectively unlearning undesirable concepts. The common principle of previous works to remove a specific concept is to map it to a fixed generic concept, such as a neutral concept or just an empty text prompt. In this paper, we demonstrate that this fixed-target strategy is suboptimal, as it fails to account for the impact of erasing one concept on the others. To address this limitation, we model the concept space as a graph and empirically analyze the effects of erasing one concept on the remaining concepts. Our analysis uncovers intriguing geometric properties of the concept space, where the influence of erasing a concept is confined to a local region. Building on this insight, we propose the Adaptive Guided Erasure (AGE) method, which \emph{dynamically} selects optimal target concepts tailored to each undesirable concept, minimizing unintended side effects. Experimental results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance. Our code is published at {https://github.com/tuananhbui89/Adaptive-Guided-Erasure}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Nudity Erasure | I2P 1.0 (test) | ASR (UD Attack)9.47 | 16 | |
| Concept Erasure | Stable Diffusion Church object v1.4 | ASR10.66 | 13 | |
| Concept Erasure | Van Gogh style | ASR10.9 | 12 | |
| Concept Erasure | Stable Diffusion Nudity Concept v1.4 | ASR 189.45 | 12 | |
| Concept Erasure | SD Nudity v2.1 | ASR 180.99 | 9 | |
| Concept Erasure | SD Church v2.1 | ASR 182 | 9 | |
| Concept Erasure | SD Van Gogh v2.1 | ASR194 | 9 | |
| Artistic Style Removal | COCO 10K 2014 (val) | FID18.81 | 7 | |
| Nudity Removal | Adversarial Prompts (test) | MMA19.1 | 7 | |
| Artistic Style Removal | Target Prompts Van Gogh style v1.4 base (test) | ASR0.00e+0 | 7 |