A Unified Framework for Diffusion Model Unlearning with f-Divergence
About
Most existing methods for concept unlearning in text-to-image diffusion models minimize a mean squared error (MSE) loss between the denoiser outputs conditioned on a target and an anchor concept, which is implicitly the KL divergence between two Gaussians. We generalize this objective to any $f$-divergence, recovering MSE as the KL instance, and identify a family of $\alpha$-divergences whose Gaussian closed-form yields cheap, MSE-like training objectives. For the remaining $f$-divergences, we provide a min-max objective based on the variational formulation of the $f$-divergence. We theoretically analyze and numerically validate how different $f$-divergences impact the gradient magnitude and the convergence properties of the algorithm, affecting the quality of unlearning. For instance, we observe that the Hellinger closed-form instance consistently dominates MSE across multiple scenarios. More generally, the proposed unified framework offers a flexible paradigm for selecting the optimal divergence based on the application and user goal, allowing for finer control over the trade-off between unlearning efficacy and generative fidelity.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Machine Unlearning | MMA (Target) | Nudity Generation Rate3.5 | 24 | |
| Nudity Erasure | Ring-a-Bell | Generation Rate79.6 | 17 | |
| Unlearning | SD Van Gogh Erased Set 2.1 | CS Score0.755 | 14 | |
| Unlearning | SD Preserved Concepts Set 2.1 | CS Score70.2 | 14 | |
| Unlearning Nudity | I2P | Nudity Generation Rate6.3 | 13 | |
| Unlearning Nudity | MMA-Diffusion | Targeted Unlearning Efficacy50.4 | 13 | |
| Machine Unlearning | I2P | Nudity Generation Rate6.3 | 12 | |
| Machine Unlearning | MMA (Adversarial) | Nudity Generation Rate4.9 | 12 | |
| Machine Unlearning | RAB | Nudity Generation Rate15.7 | 12 |