Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Diffusion Autoencoders: Toward a Meaningful and Decodable Representation

About

Diffusion probabilistic models (DPMs) have achieved remarkable quality in image generation that rivals GANs'. But unlike GANs, DPMs use a set of latent variables that lack semantic meaning and cannot serve as a useful representation for other tasks. This paper explores the possibility of using DPMs for representation learning and seeks to extract a meaningful and decodable representation of an input image via autoencoding. Our key idea is to use a learnable encoder for discovering the high-level semantics, and a DPM as the decoder for modeling the remaining stochastic variations. Our method can encode any image into a two-part latent code, where the first part is semantically meaningful and linear, and the second part captures stochastic details, allowing near-exact reconstruction. This capability enables challenging applications that currently foil GAN-based methods, such as attribute manipulation on real images. We also show that this two-level encoding improves denoising efficiency and naturally facilitates various downstream tasks including few-shot conditional sampling. Please visit our project page: https://Diff-AE.github.io/

Konpat Preechakul, Nattanat Chatthee, Suttisak Wizadwongsa, Supasorn Suwajanakorn• 2021

Related benchmarks

TaskDatasetResultRank
Image GenerationCelebA 64 x 64 (test)
FID22.7
203
Unconditional Image GenerationCelebA unconditional 64 x 64
FID4.97
95
Unconditional Image GenerationFFHQ 256x256
FID5.81
64
Image ReconstructionCelebA-HQ (test)--
50
Image GenerationCelebA (test)
FID22.7
49
Image GenerationFFHQ 256x256 (test)
FID5.81
30
Image ReconstructionFFHQ No glasses
LPIPS0.014
18
Image ReconstructionFFHQ Glasses
LPIPS0.014
18
DisentanglementCelebA-HQ (test)
Disentanglement64.39
13
Image ClassificationCelebA-HQ (test)
F1 Score68.7
13
Showing 10 of 14 rows

Other info

Follow for update