D2C: Diffusion-Denoising Models for Few-shot Conditional Generation

About

Conditional generative models of high-dimensional images have many applications, but supervision signals from conditions to images can be expensive to acquire. This paper describes Diffusion-Decoding models with Contrastive representations (D2C), a paradigm for training unconditional variational autoencoders (VAEs) for few-shot conditional image generation. D2C uses a learned diffusion-based prior over the latent representations to improve generation and contrastive self-supervised learning to improve representation quality. D2C can adapt to novel generation tasks conditioned on labels or manipulation constraints, by learning from as few as 100 labeled examples. On conditional generation from new labels, D2C achieves superior performance over state-of-the-art VAEs and diffusion models. On conditional image manipulation, D2C generations are two orders of magnitude faster to produce over StyleGAN2 ones and are preferred by 50% - 60% of the human evaluators in a double-blind study.

Abhishek Sinha, Jiaming Song, Chenlin Meng, Stefano Ermon• 2021

Related benchmarks

Task	Dataset	Result
Unconditional Image Generation	CIFAR-10 (test)	FID10.15	223
Unconditional Image Generation	CIFAR-10 unconditional	FID10.15	215
Image Generation	CelebA 64 x 64 (test)	--	208
Image Generation	CelebA-64	FID5.7	103
Unconditional Image Generation	CelebA unconditional 64 x 64	FID5.15	95
Image Generation	FFHQ	FID13.04	91
Unconditional Image Generation	FFHQ 256x256	FID7.94	80
Image Generation	CelebA-HQ 256x256	FID18.74	55
Few-shot conditional generation	CelebA-64 (train)	FID8.94	40
Image Generation	CelebA-HQ 256x256 (test)	FID18.74	34

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord