Exploiting Diffusion Prior for Real-World Image Super-Resolution
About
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution (SR). Specifically, by employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model, thereby preserving the generative prior and minimizing training cost. To remedy the loss of fidelity caused by the inherent stochasticity of diffusion models, we employ a controllable feature wrapping module that allows users to balance quality and fidelity by simply adjusting a scalar value during the inference process. Moreover, we develop a progressive aggregation sampling strategy to overcome the fixed-size constraints of pre-trained diffusion models, enabling adaptation to resolutions of any size. A comprehensive evaluation of our method using both synthetic and real-world benchmarks demonstrates its superiority over current state-of-the-art approaches. Code and models are available at https://github.com/IceClear/StableSR.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | COCO 2017 (val) | -- | 2643 | |
| Instance Segmentation | COCO 2017 (val) | APm0.146 | 1201 | |
| Semantic segmentation | ADE20K | mIoU19.6 | 1024 | |
| Super-Resolution | Set5 | PSNR20.35 | 785 | |
| Super-Resolution | DIV2K | PSNR20.59 | 134 | |
| Image Super-resolution | RealSR | PSNR26.27 | 130 | |
| Image Super-resolution | DRealSR | MANIQA0.5601 | 130 | |
| Image Super-resolution | DIV2K (val) | LPIPS0.3113 | 106 | |
| Super-Resolution | ODI-SR (test) | WS-PSNR22.29 | 93 | |
| Super-Resolution | SUN 360 Panorama (test) | WS-PSNR22.55 | 70 |