Elucidating the Design Space of Diffusion-Based Generative Models

About

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.

Tero Karras, Miika Aittala, Timo Aila, Samuli Laine• 2022

Related benchmarks

Task	Dataset	Result
Image Classification	CIFAR-10 (test)	Accuracy95.81	3381
Image Generation	CIFAR-10 (test)	FID1.85	536
Unconditional Image Generation	CIFAR-10	FID1.91	280
Unconditional Image Generation	CIFAR-10 (test)	FID1.97	223
Image Generation	CIFAR-10	FID1.84	212
Unconditional Image Generation	CIFAR-10 unconditional	FID1.77	209
Image Generation	CelebA 64 x 64 (test)	FID10	208
Image Generation	CIFAR10 32x32 (test)	FID1.99	186
Class-conditional Image Generation	ImageNet 64x64	FID1.36	170
Image Generation	CIFAR-10 32x32	FID1.96	151

Showing 10 of 111 rows

...

Other info

Follow for update

@wizwand_team Discord