Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Controlling Vision-Language Models for Multi-Task Image Restoration

About

Vision-language models such as CLIP have shown great impact on diverse downstream tasks for zero-shot or label-free predictions. However, when it comes to low-level vision such as image restoration their performance deteriorates dramatically due to corrupted inputs. In this paper, we present a degradation-aware vision-language model (DA-CLIP) to better transfer pretrained vision-language models to low-level vision tasks as a multi-task framework for image restoration. More specifically, DA-CLIP trains an additional controller that adapts the fixed CLIP image encoder to predict high-quality feature embeddings. By integrating the embedding into an image restoration network via cross-attention, we are able to pilot the model to learn a high-fidelity image reconstruction. The controller itself will also output a degradation feature that matches the real corruptions of the input, yielding a natural classifier for different degradation types. In addition, we construct a mixed degradation dataset with synthetic captions for DA-CLIP training. Our approach advances state-of-the-art performance on both \emph{degradation-specific} and \emph{unified} image restoration tasks, showing a promising direction of prompting image restoration with large-scale pretrained vision-language models. Our code is available at https://github.com/Algolzw/daclip-uir.

Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sj\"olund, Thomas B. Sch\"on• 2023

Related benchmarks

TaskDatasetResultRank
Image DeblurringRealBlur-J (test)
PSNR20.53
226
Image DeblurringGoPro
PSNR26.5
221
Image DehazingSOTS (test)
PSNR30.12
161
Image DerainingRain100L (test)
PSNR35.92
161
Low-light Image EnhancementLOL
PSNR24.17
122
DehazingSOTS
PSNR29.78
117
DerainingRain100L
PSNR36.28
116
Low-light Image EnhancementLOL v1
PSNR21.94
113
Image DehazingSOTS Outdoor
PSNR28.1
112
DenoisingBSD68 sigma=25
PSNR30.42
70
Showing 10 of 73 rows
...

Other info

Code

Follow for update