Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Diffusion Autoencoders: Toward a Meaningful and Decodable Representation

About

Diffusion probabilistic models (DPMs) have achieved remarkable quality in image generation that rivals GANs'. But unlike GANs, DPMs use a set of latent variables that lack semantic meaning and cannot serve as a useful representation for other tasks. This paper explores the possibility of using DPMs for representation learning and seeks to extract a meaningful and decodable representation of an input image via autoencoding. Our key idea is to use a learnable encoder for discovering the high-level semantics, and a DPM as the decoder for modeling the remaining stochastic variations. Our method can encode any image into a two-part latent code, where the first part is semantically meaningful and linear, and the second part captures stochastic details, allowing near-exact reconstruction. This capability enables challenging applications that currently foil GAN-based methods, such as attribute manipulation on real images. We also show that this two-level encoding improves denoising efficiency and naturally facilitates various downstream tasks including few-shot conditional sampling. Please visit our project page: https://Diff-AE.github.io/

Konpat Preechakul, Nattanat Chatthee, Suttisak Wizadwongsa, Supasorn Suwajanakorn• 2021

Related benchmarks

TaskDatasetResultRank
Image GenerationCelebA 64 x 64 (test)
FID22.7
208
Unconditional Image GenerationCelebA unconditional 64 x 64
FID4.97
95
Unconditional Image GenerationFFHQ 256x256
FID5.81
80
Face Attribute EditingCelebA-HQ (test)
FID20.42
56
Image ReconstructionCelebA-HQ (test)--
50
Image GenerationCelebA (test)
FID22.7
49
Image GenerationFFHQ 256x256 (test)
FID5.81
38
DisentanglementShapes3D (test)
DCI0.0653
28
Image ReconstructionFFHQ No glasses
LPIPS0.014
18
Image ReconstructionFFHQ Glasses
LPIPS0.014
18
Showing 10 of 18 rows

Other info

Follow for update