VIPaint: Image Inpainting with Pre-Trained Diffusion Models via Variational Inference
About
Diffusion probabilistic models learn to remove noise added during training, generating novel data (e.g., images) from Gaussian noise through sequential denoising. However, conditioning the generative process on corrupted or masked images is challenging. While various methods have been proposed for inpainting masked images with diffusion priors, they often fail to produce samples from the true conditional distribution, especially for large masked regions. Many baselines also cannot be applied to latent diffusion models which generate high-quality images with much lower computational cost. We propose a hierarchical variational inference algorithm that optimizes a non-Gaussian Markov approximation of the true diffusion posterior. Our VIPaint method outperforms existing approaches to inpainting, producing diverse high-quality imputations even for state-of-the-art text-conditioned latent diffusion models, and is also effective for other inverse problems like deblurring and superresolution.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Super-Resolution | ImageNet 256 | PSNR18.9 | 50 | |
| Image Inpainting | ImageNet 64x64 (test) | PSNR13.33 | 16 | |
| Image Inpainting | ImageNet64 Random Mask (test) | PSNR13.33 | 8 | |
| Image Inpainting | ImageNet64 Rotated Window Mask (test) | PSNR9.24 | 8 | |
| Image Inpainting | LSUN Churches 256 Random Mask | LPIPS0.44 | 7 | |
| Gaussian deblur | ImageNet 64 | LPIPS0.31 | 6 | |
| Image Inpainting | ImageNet256 Rotated Window | PSNR9.43 | 5 | |
| Image Inpainting | ImageNet256 Random Mask | PSNR10.04 | 5 | |
| Image Inpainting | LSUN Churches 256 Rotated Window | PSNR8.39 | 5 | |
| Image Inpainting | LSUN-Churches256 Small Mask | PSNR16.18 | 5 |