Inverse problems with diffusion models: MAP estimation via mode-seeking loss
About
A pre-trained unconditional diffusion model, combined with posterior sampling or maximum a posteriori (MAP) estimation techniques, can solve arbitrary inverse problems without task-specific training or fine-tuning. However, existing posterior sampling and MAP estimation methods often rely on modeling approximations and can also be computationally demanding. In this work, we propose a new MAP estimation strategy for solving inverse problems with a pre-trained unconditional diffusion model. Specifically, we introduce the variational mode-seeking loss (VML) and show that its minimization at each reverse diffusion step guides the generated sample towards the MAP estimate (modes in practice). VML arises from a novel perspective of minimizing the Kullback-Leibler (KL) divergence between the diffusion posterior $p(\mathbf{x}_0|\mathbf{x}_t)$ and the measurement posterior $p(\mathbf{x}_0|\mathbf{y})$, where $\mathbf{y}$ denotes the measurement. Importantly, for linear inverse problems, VML can be analytically derived without any modeling approximations. Based on further theoretical insights, we propose VML-MAP, an empirically effective algorithm for solving inverse problems via VML minimization, and validate its efficacy in both performance and computational time through extensive experiments on diverse image-restoration tasks across multiple datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Gaussian Deblurring | FFHQ 256x256 (val) | FID84.88 | 24 | |
| Image Inpainting | FFHQ 256x256 (val) | FID52.76 | 22 | |
| 4x super-resolution | FFHQ 256x256 (val) | FID52.2 | 19 | |
| Super-Resolution (x4) | ImageNet 256 x 256 (val) | FID58.6 | 17 | |
| Face inpainting (Half) | CelebA-HQ-256 (test) | LPIPS0.208 | 12 | |
| Uniform deblurring | ImageNet 256x256 (val) | LPIPS0.367 | 12 | |
| Super-Resolution | ImageNet 256 | PSNR23.63 | 12 | |
| Box Inpainting | ImageNet 256 x 256 (val) | FID75.8 | 11 | |
| Inpainting | ImageNet 256x256 (val) | LPIPS0.262 | 7 | |
| Deblurring | ImageNet 256 | PSNR20.4 | 7 |