CoreUnlearn: Rethinking Concept Unlearning through Disentangled Component-Level Erasure in Text-guided Diffusion Models

About

Text guided diffusion models have revolutionized image synthesis but also raise ethical concerns, such as privacy violation and harmful content generation. To mitigate these issues, prevailing methods typically leverage an alignment mechanism, with predefined erasure references, to fine-tune pretrained model weights. However, these techniques are intrinsically limited by the representational capacity of textual space and display high sensitivity to the choice of predefined erasure references, e.g., suboptimal references may significantly affect the model utility preservation during erasure. To overcome these limitations, we introduce CoreUnlearn, aiming to disentangle and remove the erasure-critical component of the undesirable concept. Specifically, CoreUnlearn comprises a Component Extraction Module (CEM) and a Swap Disentangling Strategy (SDS). Guided by SDS, CEM is pre-trained to decompose concept embeddings into distinct component types. Leveraging this decomposition, CoreUnlearn then removes the erasure-critical component while retaining non-critical ones by fine-tuning model weights. Extensive experiments demonstrate that CoreUnlearn achieves effective concept erasure with minimal impact on overall model performance.

Mengnan Zhao, Lihe Zhang, Baocai Yin• 2026

Related benchmarks

Task	Dataset	Result
Explicit Content Removal	I2P	Buttocks Count2	47
Object Unlearning	Imagenette Object Unlearning Subset	ERASE FID259.5	16
Style Unlearning	Artistic Styles SD v1.4 (test)	ERASE FID294.2	16
Concept Unlearning	I2P Stable Diffusion v1.4	Erase ACC12.65	7
Object Unlearning	SD 2	ERASE FID244.6	5
Style Unlearning	SD 2	ERASE FID212	5

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord