Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them
About
Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models by selectively unlearning undesirable concepts. The common principle of previous works to remove a specific concept is to map it to a fixed generic concept, such as a neutral concept or just an empty text prompt. In this paper, we demonstrate that this fixed-target strategy is suboptimal, as it fails to account for the impact of erasing one concept on the others. To address this limitation, we model the concept space as a graph and empirically analyze the effects of erasing one concept on the remaining concepts. Our analysis uncovers intriguing geometric properties of the concept space, where the influence of erasing a concept is confined to a local region. Building on this insight, we propose the Adaptive Guided Erasure (AGE) method, which \emph{dynamically} selects optimal target concepts tailored to each undesirable concept, minimizing unintended side effects. Experimental results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance. Our code is published at {https://github.com/tuananhbui89/Adaptive-Guided-Erasure}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Concept Erasure | Van Gogh style | FID17.32 | 39 | |
| Nudity Erasure | I2P | Total Count300 | 38 | |
| Artistic Style Erasure | SD Target Class artistic styles 1.4 (test) | Erased Accuracy31.5 | 36 | |
| Artistic Style Erasure | SD Other Class artistic styles 1.4 (test) | Preservation Drop4.3 | 36 | |
| Utility Preservation | COCO-10K (val) | FID24.01 | 20 | |
| Object Erasure | ImageNet-10 Target Concepts SD 1.4 | Original Accuracy0.859 | 19 | |
| Nudity Erasure | I2P 1.0 (test) | ASR (UD Attack)9.47 | 16 | |
| Concept Preservation | ImageNet 10 Preserved Concepts SD 1.4 | Original Accuracy85.9 | 15 | |
| Concept Erasure | Stable Diffusion Church object v1.4 | ASR10.66 | 13 | |
| Concept Erasure | Stable Diffusion Nudity Concept v1.4 | ASR 189.45 | 12 |