Maximum Likelihood Training of Score-Based Diffusion Models
About
Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based diffusion models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based diffusion models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based diffusion models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.83 and 3.76 bits/dim on CIFAR-10 and ImageNet 32x32 without any data augmentation, on a par with state-of-the-art autoregressive models on these tasks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Generation | CIFAR-10 (test) | FID2.2 | 471 | |
| Image Generation | CIFAR10 32x32 (test) | FID3.98 | 154 | |
| Density Estimation | CIFAR-10 (test) | Bits/dim2.83 | 134 | |
| Image Generation | ImageNet 64x64 | FID24.95 | 114 | |
| Unconditional Generation | CIFAR-10 (test) | FID2.87 | 102 | |
| Density Estimation | ImageNet 32x32 (test) | Bits per Sub-pixel3.76 | 66 | |
| Generative Modeling | CIFAR-10 (test) | NLL (bits/dim)2.83 | 62 | |
| Unconditional Image Generation | CIFAR10 | -- | 33 | |
| Likelihood Estimation | CIFAR-10 (test) | NLL (BPD)2.9 | 24 | |
| Image Generation | ImageNet-32 | FID8.31 | 20 |