EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits

About

Text-guided image editing, fueled by recent advancements in generative AI, is becoming increasingly widespread. This trend highlights the need for a comprehensive framework to verify text-guided edits and assess their quality. To address this need, we introduce EditInspector, a novel benchmark for evaluation of text-guided image edits, based on human annotations collected using an extensive template for edit verification. We leverage EditInspector to evaluate the performance of state-of-the-art (SoTA) vision and language models in assessing edits across various dimensions, including accuracy, artifact detection, visual quality, seamless integration with the image scene, adherence to common sense, and the ability to describe edit-induced changes. Our findings indicate that current models struggle to evaluate edits comprehensively and frequently hallucinate when describing the changes. To address these challenges, we propose two novel methods that outperform SoTA models in both artifact detection and difference caption generation.

Ron Yosef, Moran Yanuka, Yonatan Bitton, Dani Lischinski• 2025

Related benchmarks

Task	Dataset	Result
Visual Question Answering (Edit Inspection)	EditInspector 1.0 (test)	Accuracy67.2	9
Difference Caption Generation	EditInspector 1.0 (test)	Main Difference Count10	9
Edit Inspection Questions	Imagen3 edits	Accuracy58.8	7
Difference Caption Generation	UltraEdit	Main Difference Score9	7
Edit Inspectors Question Answering	MagicBrush (test)	Accuracy62.3	7
Edit Inspectors Questions	UltraEdit	Accuracy54.3	7
Difference Caption Generation	Imagen edits 3	Main Difference Score11	7
Differences Caption Generation	MagicBrush (test)	Main Difference Count12	7

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord