Prompt-tuning latent diffusion models for inverse problems

About

We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To address this limitation, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly while running the reverse diffusion process. This allows us to generate images that are more faithful to the diffusion prior. In addition, we propose a method to keep the evolution of latent variables within the range space of the encoder, by projection. This helps to reduce image artifacts, a major problem when using latent diffusion models instead of pixel-based diffusion models. Our combined method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting.

Hyungjin Chung, Jong Chul Ye, Peyman Milanfar, Mauricio Delbracio• 2023

Related benchmarks

Task	Dataset	Result
Motion Deblurring	FFHQ 1k	PSNR25.52	13
Super-resolution (x8)	ImageNet 512 (val)	FID55.04	7
Gaussian Deblurring	FFHQ 512 (val)	FID45.12	7
Super-resolution (x8)	FFHQ 512 (val)	FID52.14	7
Gaussian Deblurring	ImageNet 512 (val)	FID59.77	7
Motion Deblurring	FFHQ 512 (val)	FID55.73	7
Motion Deblurring	ImageNet 512 (val)	FID159.3	7
Image Restoration	FFHQ 512 (test)	VRAM (GB)10.6	7
Super-resolution (x8)	FFHQ 1,000 samples (test)	PSNR25.31	6
Noisy JPEG Restoration	FFHQ 512 (val)	FID75.57	5

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord