Diffusion Autoencoders: Toward a Meaningful and Decodable Representation

About

Diffusion probabilistic models (DPMs) have achieved remarkable quality in image generation that rivals GANs'. But unlike GANs, DPMs use a set of latent variables that lack semantic meaning and cannot serve as a useful representation for other tasks. This paper explores the possibility of using DPMs for representation learning and seeks to extract a meaningful and decodable representation of an input image via autoencoding. Our key idea is to use a learnable encoder for discovering the high-level semantics, and a DPM as the decoder for modeling the remaining stochastic variations. Our method can encode any image into a two-part latent code, where the first part is semantically meaningful and linear, and the second part captures stochastic details, allowing near-exact reconstruction. This capability enables challenging applications that currently foil GAN-based methods, such as attribute manipulation on real images. We also show that this two-level encoding improves denoising efficiency and naturally facilitates various downstream tasks including few-shot conditional sampling. Please visit our project page: https://Diff-AE.github.io/

Konpat Preechakul, Nattanat Chatthee, Suttisak Wizadwongsa, Supasorn Suwajanakorn• 2021

Related benchmarks

Task	Dataset	Result
Image Generation	CelebA 64 x 64 (test)	FID22.7	208
Unconditional Image Generation	CelebA unconditional 64 x 64	FID4.97	95
Unconditional Image Generation	FFHQ 256x256	FID5.81	80
Face Attribute Editing	CelebA-HQ (test)	FID20.42	56
Image Reconstruction	CelebA-HQ (test)	--	50
Image Generation	CelebA (test)	FID22.7	49
Image Generation	FFHQ 256x256 (test)	FID5.81	38
Disentanglement	Shapes3D (test)	DCI0.0653	28
Image Reconstruction	FFHQ No glasses	LPIPS0.014	18
Image Reconstruction	FFHQ Glasses	LPIPS0.014	18

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord