Latent Diffusion Inversion Requires Understanding the Latent Space

About

The recovery of training data from generative models ("model inversion") has been extensively studied for diffusion models in the data domain as a memorization/overfitting phenomenon. Latent diffusion models (LDMs), which operate on the latent codes from encoder/decoder pairs, have been robust to prior inversion methods. In this work we describe two key findings: (1) the diffusion model exhibits non-uniform memorization across latent codes, tending to overfit samples located in high-distortion regions of the decoder pullback metric; (2) even within a single latent code, memorization contributions are unequal across representation dimensions. Our proposed method to ranks latent dimensions by their contribution to the decoder pullback metric, which in turn identifies dimensions that contribute to memorization. For score-based membership inference, a sub-task of model inversion, we find that removing less-memorizing dimensions improves performance on all tested methods and datasets, with average AUROC gains of 1-4% and substantial increases in TPR@1%FPR (1-32%) across diverse datasets including CIFAR-10, CelebA, ImageNet-1K, Pokemon, MS-COCO, and Flickr. Our results highlight the overlooked influence of the auto-encoder geometry on LDM memorization and provide a new perspective for analyzing privacy risks in diffusion-based generative models.

Mingxing Rao, Bowen Qu, Daniel Moyer• 2025

Related benchmarks

Task	Dataset	Result
Membership Inference Attack	CIFAR-10	AUC91.26	120
Membership Inference Attack	CelebA	AUC88.18	22
Membership Inference Attack	Flickr (test)	AUC74.16	21
Membership Inference Attack	ImageNet	AUC72.55	15
Membership Inference Attack	Pokemon (test)	AUC96.23	9
Membership Inference Attack	MS-COCO (test)	AUC96.86	9

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord