A Unified Framework for Diffusion Model Unlearning with f-Divergence

About

Most existing methods for concept unlearning in text-to-image diffusion models minimize a mean squared error (MSE) loss between the denoiser outputs conditioned on a target and an anchor concept, which is implicitly the KL divergence between two Gaussians. We generalize this objective to any $f$-divergence, recovering MSE as the KL instance, and identify a family of $\alpha$-divergences whose Gaussian closed-form yields cheap, MSE-like training objectives. For the remaining $f$-divergences, we provide a min-max objective based on the variational formulation of the $f$-divergence. We theoretically analyze and numerically validate how different $f$-divergences impact the gradient magnitude and the convergence properties of the algorithm, affecting the quality of unlearning. For instance, we observe that the Hellinger closed-form instance consistently dominates MSE across multiple scenarios. More generally, the proposed unified framework offers a flexible paradigm for selecting the optimal divergence based on the application and user goal, allowing for finer control over the trade-off between unlearning efficacy and generative fidelity.

Nicola Novello, Federico Fontana, Luigi Cinque, Deniz Gunduz, Andrea M. Tonello• 2025

Related benchmarks

Task	Dataset	Result
Machine Unlearning	MMA (Target)	Nudity Generation Rate3.5	24
Nudity Erasure	Ring-a-Bell	Generation Rate79.6	17
Unlearning	SD Van Gogh Erased Set 2.1	CS Score0.755	14
Unlearning	SD Preserved Concepts Set 2.1	CS Score70.2	14
Unlearning Nudity	I2P	Nudity Generation Rate6.3	13
Unlearning Nudity	MMA-Diffusion	Targeted Unlearning Efficacy50.4	13
Machine Unlearning	I2P	Nudity Generation Rate6.3	12
Machine Unlearning	MMA (Adversarial)	Nudity Generation Rate4.9	12
Machine Unlearning	RAB	Nudity Generation Rate15.7	12

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord