Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Unified Framework for Diffusion Model Unlearning with f-Divergence

About

Most existing methods for concept unlearning in text-to-image diffusion models minimize a mean squared error (MSE) loss between the denoiser outputs conditioned on a target and an anchor concept, which is implicitly the KL divergence between two Gaussians. We generalize this objective to any $f$-divergence, recovering MSE as the KL instance, and identify a family of $\alpha$-divergences whose Gaussian closed-form yields cheap, MSE-like training objectives. For the remaining $f$-divergences, we provide a min-max objective based on the variational formulation of the $f$-divergence. We theoretically analyze and numerically validate how different $f$-divergences impact the gradient magnitude and the convergence properties of the algorithm, affecting the quality of unlearning. For instance, we observe that the Hellinger closed-form instance consistently dominates MSE across multiple scenarios. More generally, the proposed unified framework offers a flexible paradigm for selecting the optimal divergence based on the application and user goal, allowing for finer control over the trade-off between unlearning efficacy and generative fidelity.

Nicola Novello, Federico Fontana, Luigi Cinque, Deniz Gunduz, Andrea M. Tonello• 2025

Related benchmarks

TaskDatasetResultRank
Machine UnlearningMMA (Target)
Nudity Generation Rate3.5
24
Nudity ErasureRing-a-Bell
Generation Rate79.6
17
UnlearningSD Van Gogh Erased Set 2.1
CS Score0.755
14
UnlearningSD Preserved Concepts Set 2.1
CS Score70.2
14
Unlearning NudityI2P
Nudity Generation Rate6.3
13
Unlearning NudityMMA-Diffusion
Targeted Unlearning Efficacy50.4
13
Machine UnlearningI2P
Nudity Generation Rate6.3
12
Machine UnlearningMMA (Adversarial)
Nudity Generation Rate4.9
12
Machine UnlearningRAB
Nudity Generation Rate15.7
12
Showing 9 of 9 rows

Other info

Follow for update