HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

About

This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expanded, and then used to create high-quality diptychs featuring input and output images with detailed text prompts, followed by precise alignment ensured through post-processing. In addition, we propose two evaluation metrics, Alignment and Coherence, to quantitatively assess the quality of image edit pairs using GPT-4V. HQ-Edits high-resolution images, rich in detail and accompanied by comprehensive editing prompts, substantially enhance the capabilities of existing image editing models. For example, an HQ-Edit finetuned InstructPix2Pix can attain state-of-the-art image editing performance, even surpassing those models fine-tuned with human-annotated data. The project page is https://thefllood.github.io/HQEdit_web.

Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie• 2024

Related benchmarks

Task	Dataset	Result
Instructive image editing	EMU Edit (test)	CLIP Image Similarity0.7095	83
Multi-turn image editing	MSE-Bench	Success Rate (Turn 1)47.7	26
Object Retexture	UHRSD (test)	MSE8.03e+3	14
Image Editing	ECSSD (test)	MSE7.73e+3	13
Image Editing Quality Evaluation	Various Image Editing Datasets	Instruction Adherence Score2.9	12
Multi-object editing	CompBench	LC-T19.163	11

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord