Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

When Are Concepts Erased From Diffusion Models?

About

In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these approaches remove the target concept from the model. We begin by proposing two conceptual models for the erasure mechanism in diffusion models: (i) interfering with the model's internal guidance processes, and (ii) reducing the unconditional likelihood of generating the target concept, potentially removing it entirely. To assess whether a concept has been truly erased from the model, we introduce a comprehensive suite of independent probing techniques: supplying visual context, modifying the diffusion trajectory, applying classifier guidance, and analyzing the model's alternative generations that emerge in place of the erased concept. Our results shed light on the value of exploring concept erasure robustness outside of adversarial text inputs, and emphasize the importance of comprehensive evaluations for erasure in diffusion models.

Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen• 2025

Related benchmarks

TaskDatasetResultRank
Semantic AlignmentNudity-I2P
CLIP Score19.83
31
Semantic AlignmentParachute
CLIP Score22.05
31
Semantic AlignmentVan Gogh
CLIP Score18.29
31
Semantic AlignmentChurch
CLIP Score21.16
30
Style UnlearningVan Gogh style
ESD14
11
Object UnlearningObject-Parachute
ESD30
11
Nudity UnlearningI2P
ESD51.41
11
Object UnlearningObject Church
ESD48
11
Nudity UnlearningMMA
ESD61.72
10
Nudity UnlearningArT
ESD7.81
10
Showing 10 of 10 rows

Other info

Follow for update