PromptFix: You Prompt and We Fix the Photo

About

Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks, allowing image processing to adhere to human instructions. However, the lack of diverse instruction-following data hampers the development of models that effectively recognize and execute user-customized instructions, particularly in low-level tasks. Moreover, the stochastic nature of the diffusion process leads to deficiencies in image generation or editing tasks that require the detailed preservation of the generated images. To address these limitations, we propose PromptFix, a comprehensive framework that enables diffusion models to follow human instructions to perform a wide variety of image-processing tasks. First, we construct a large-scale instruction-following dataset that covers comprehensive image-processing tasks, including low-level tasks, image editing, and object creation. Next, we propose a high-frequency guidance sampling method to explicitly control the denoising process and preserve high-frequency details in unprocessed areas. Finally, we design an auxiliary prompting adapter, utilizing Vision-Language Models (VLMs) to enhance text prompts and improve the model's task generalization. Experimental results show that PromptFix outperforms previous methods in various image-processing tasks. Our proposed model also achieves comparable inference efficiency with these baseline models and exhibits superior zero-shot capabilities in blind restoration and combination tasks. The dataset and code are available at https://www.yongshengyu.com/PromptFix-Page.

Yongsheng Yu, Ziyun Zeng, Hang Hua, Jianlong Fu, Jiebo Luo• 2024

Related benchmarks

Task	Dataset	Result
Low-light enhancement	Low-light enhancement dataset	LPIPS0.135	11
Desnowing	Desnow dataset	LPIPS0.103	7
Dehazing	Dehazy dataset	LPIPS0.088	7
Super-Resolution	Super Resolution dataset	LPIPS0.143	7
Desnowing	Images 200 sampled (test)	LPIPS0.115	5
Low-light enhancement	200 sampled images (test)	LPIPS0.161	5
Multi-task Image Restoration	Multi-degradation 200 sampled images (test)	PSNR22.05	5
Dehazing	200 sampled images (test)	LPIPS0.148	5
Colorization	Colorization dataset	LPIPS0.233	4
Object Removal	Object Removal	LPIPS0.054	4

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord