Latent Diffusion Inversion Requires Understanding the Latent Space
About
The recovery of training data from generative models ("model inversion") has been extensively studied for diffusion models in the data domain as a memorization/overfitting phenomenon. Latent diffusion models (LDMs), which operate on the latent codes from encoder/decoder pairs, have been robust to prior inversion methods. In this work we describe two key findings: (1) the diffusion model exhibits non-uniform memorization across latent codes, tending to overfit samples located in high-distortion regions of the decoder pullback metric; (2) even within a single latent code, memorization contributions are unequal across representation dimensions. Our proposed method to ranks latent dimensions by their contribution to the decoder pullback metric, which in turn identifies dimensions that contribute to memorization. For score-based membership inference, a sub-task of model inversion, we find that removing less-memorizing dimensions improves performance on all tested methods and datasets, with average AUROC gains of 1-4% and substantial increases in TPR@1%FPR (1-32%) across diverse datasets including CIFAR-10, CelebA, ImageNet-1K, Pokemon, MS-COCO, and Flickr. Our results highlight the overlooked influence of the auto-encoder geometry on LDM memorization and provide a new perspective for analyzing privacy risks in diffusion-based generative models.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Membership Inference Attack | CIFAR-10 | AUC91.26 | 107 | |
| Membership Inference Attack | Flickr (test) | AUC74.16 | 21 | |
| Membership Inference Attack | CelebA | AUC88.18 | 9 | |
| Membership Inference Attack | ImageNet | AUC72.55 | 9 | |
| Membership Inference Attack | Pokemon (test) | AUC96.23 | 9 | |
| Membership Inference Attack | MS-COCO (test) | AUC96.86 | 9 |