Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Consistency-Preserving Concept Erasure via Unsafe-Safe Pairing and Directional Fisher-weighted Adaptation

About

With the increasing versatility of text-to-image diffusion models, the ability to selectively erase undesirable concepts (e.g., harmful content) has become indispensable. However, existing concept erasure approaches primarily focus on removing unsafe concepts without providing guidance toward corresponding safe alternatives, which often leads to failure in preserving the structural and semantic consistency between the original and erased generations. In this paper, we propose a novel framework, PAIRed Erasing (PAIR), which reframes concept erasure from simple removal to consistency-preserving semantic realignment using unsafe-safe pairs. We first generate safe counterparts from unsafe inputs while preserving structural and semantic fidelity, forming paired unsafe-safe multimodal data. Leveraging these pairs, we introduce two key components: (1) Paired Semantic Realignment, a guided objective that uses unsafe-safe pairs to explicitly map target concepts to semantically aligned safe anchors; and (2) Fisher-weighted Initialization for DoRA, which initializes parameter-efficient low-rank adaptation matrices using unsafe-safe pairs, encouraging the generation of safe alternatives while selectively suppressing unsafe concepts. Together, these components enable fine-grained erasure that removes only the targeted concepts while maintaining overall semantic consistency. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art baselines, achieving effective concept erasure while preserving structural integrity, semantic coherence, and generation quality.

Yongwoo Kim, Sungmin Cha, Hyunsoo Kim, Jaewon Lee, Donghyun Kim• 2026

Related benchmarks

TaskDatasetResultRank
Nudity ErasureI2P 1.0 (test)
ASR (UD Attack)7.37
16
Artistic Style RemovalCOCO 10K 2014 (val)
FID16.9
7
Utility PreservationCOCO-10K (val)
FID16.93
7
Artistic Style RemovalTarget Prompts Van Gogh style v1.4 base (test)
ASR1
7
Nudity RemovalAdversarial Prompts (test)
MMA7.5
7
Object RemovalCOCO-10K
FID16.46
6
Object RemovalTarget Prompts tench
ASR0.4
6
Showing 7 of 7 rows

Other info

Follow for update