Improving Image Restoration through Removing Degradations in Textual Representations
About
In this paper, we introduce a new perspective for improving image restoration by removing degradation in the textual representations of a given degraded image. Intuitively, restoration is much easier on text modality than image one. For example, it can be easily conducted by removing degradation-related words while keeping the content-aware words. Hence, we combine the advantages of images in detail description and ones of text in degradation removal to perform restoration. To address the cross-modal assistance, we propose to map the degraded images into textual representations for removing the degradations, and then convert the restored textual representations into a guidance image for assisting image restoration. In particular, We ingeniously embed an image-to-text mapper and text restoration module into CLIP-equipped text-to-image models to generate the guidance. Then, we adopt a simple coarse-to-fine approach to dynamically inject multi-scale information from guidance to image restoration networks. Extensive experiments are conducted on various image restoration tasks, including deblurring, dehazing, deraining, and denoising, and all-in-one image restoration. The results showcase that our method outperforms state-of-the-art ones across all these tasks. The codes and models are available at \url{https://github.com/mrluin/TextualDegRemoval}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Deraining | Rain100L | PSNR37.58 | 190 | |
| Image Dehazing | SOTS | PSNR31.63 | 141 | |
| All-in-one Image Restoration | SOTS Rain100L BSD68 Five-pattern setting | Average PSNR32.56 | 9 | |
| Image Denoising | BSD68 | PSNR (sigma=15)34.01 | 9 | |
| All-in-one Image Restoration | Five-pattern setting | Parameters (M)112 | 9 |