Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Understanding Diffusion Objectives as the ELBO with Simple Data Augmentation

About

To achieve the highest perceptual quality, state-of-the-art diffusion models are optimized with objectives that typically look very different from the maximum likelihood and the Evidence Lower Bound (ELBO) objectives. In this work, we reveal that diffusion model objectives are actually closely related to the ELBO. Specifically, we show that all commonly used diffusion model objectives equate to a weighted integral of ELBOs over different noise levels, where the weighting depends on the specific objective used. Under the condition of monotonic weighting, the connection is even closer: the diffusion objective then equals the ELBO, combined with simple data augmentation, namely Gaussian noise perturbation. We show that this condition holds for a number of state-of-the-art diffusion models. In experiments, we explore new monotonic weightings and demonstrate their effectiveness, achieving state-of-the-art FID scores on the high-resolution ImageNet benchmark.

Diederik P. Kingma, Ruiqi Gao• 2023

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256
Inception Score (IS)267.7
441
Image GenerationImageNet 256x256 (val)
FID2.12
307
Class-conditional Image GenerationImageNet 256x256 (train)
IS267.7
305
Class-conditional Image GenerationImageNet 256x256 (val)
FID2.12
293
Image GenerationImageNet 256x256
FID2.12
243
Image GenerationImageNet 512x512 (val)
FID-50K2.65
184
Class-conditional Image GenerationImageNet 256x256 (train val)
FID2.12
178
Class-conditional Image GenerationImageNet 64x64
FID1.43
126
Image GenerationImageNet 256x256 (train)
FID2.4
91
Conditional Image GenerationImageNet-1K 256x256 (val)
gFID2.12
86
Showing 10 of 23 rows

Other info

Follow for update