Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

About

Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models by selectively unlearning undesirable concepts. The common principle of previous works to remove a specific concept is to map it to a fixed generic concept, such as a neutral concept or just an empty text prompt. In this paper, we demonstrate that this fixed-target strategy is suboptimal, as it fails to account for the impact of erasing one concept on the others. To address this limitation, we model the concept space as a graph and empirically analyze the effects of erasing one concept on the remaining concepts. Our analysis uncovers intriguing geometric properties of the concept space, where the influence of erasing a concept is confined to a local region. Building on this insight, we propose the Adaptive Guided Erasure (AGE) method, which \emph{dynamically} selects optimal target concepts tailored to each undesirable concept, minimizing unintended side effects. Experimental results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance. Our code is published at {https://github.com/tuananhbui89/Adaptive-Guided-Erasure}.

Anh Bui, Trang Vu, Long Vuong, Trung Le, Paul Montague, Tamas Abraham, Junae Kim, Dinh Phung• 2025

Related benchmarks

TaskDatasetResultRank
Concept ErasureVan Gogh style
FID17.32
39
Nudity ErasureI2P
Total Count300
38
Artistic Style ErasureSD Target Class artistic styles 1.4 (test)
Erased Accuracy31.5
36
Artistic Style ErasureSD Other Class artistic styles 1.4 (test)
Preservation Drop4.3
36
Utility PreservationCOCO-10K (val)
FID24.01
20
Object ErasureImageNet-10 Target Concepts SD 1.4
Original Accuracy0.859
19
Nudity ErasureI2P 1.0 (test)
ASR (UD Attack)9.47
16
Concept PreservationImageNet 10 Preserved Concepts SD 1.4
Original Accuracy85.9
15
Concept ErasureStable Diffusion Church object v1.4
ASR10.66
13
Concept ErasureStable Diffusion Nudity Concept v1.4
ASR 189.45
12
Showing 10 of 33 rows

Other info

Follow for update