Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient

About

Text-to-image diffusion models have achieved remarkable success in generating photorealistic images. However, the inclusion of sensitive information during pre-training poses significant risks. Machine Unlearning (MU) offers a promising solution to eliminate sensitive concepts from these models. Despite its potential, existing MU methods face two main challenges: 1) limited generalization, where concept erasure is effective only within the unlearned set, failing to prevent sensitive concept generation from out-of-set prompts; and 2) utility degradation, where removing target concepts significantly impacts the model's overall performance. To address these issues, we propose a novel concept domain correction framework named \textbf{DoCo} (\textbf{Do}main \textbf{Co}rrection). By aligning the output domains of sensitive and anchor concepts through adversarial training, our approach ensures comprehensive unlearning of target concepts. Additionally, we introduce a concept-preserving gradient surgery technique that mitigates conflicting gradient components, thereby preserving the model's utility while unlearning specific concepts. Extensive experiments across various instances, styles, and offensive concepts demonstrate the effectiveness of our method in unlearning targeted concepts with minimal impact on related concepts, outperforming previous approaches even for out-of-distribution prompts.

Yongliang Wu, Shiji Zhou, Mingzhuo Yang, Lianzhe Wang, Heng Chang, Wenbo Zhu, Xinting Hu, Xiao Zhou, Xu Yang• 2024

Related benchmarks

TaskDatasetResultRank
Nudity DetectionI2P
Breast (F) Detections162
29
Nudity ErasureMMA-Diff tar 1.5 (test)
Nudity Generation Rate22.1
26
Machine UnlearningMMA (Target)
Nudity Generation Rate22.1
24
Nudity ErasureRing-a-Bell
Generation Rate65.6
17
UnlearningSD Van Gogh Erased Set 2.1
CS Score0.737
14
UnlearningSD Preserved Concepts Set 2.1
CS Score69.1
14
Nudity ErasureMMA-Diff adv. 1.5 (test)
Nudity Generation Rate28.7
13
Unlearning NudityMMA-Diffusion
Targeted Unlearning Efficacy22.1
13
Nudity ErasureRing-A-Bell 1.5 (test)
Nudity Generation Rate65.6
13
Nudity ErasureI2P 1.5 (test)
Nudity Generation Rate45.1
13
Showing 10 of 25 rows

Other info

Follow for update