ReasonEdit: Towards Interpretable Image Editing Evaluation via Reinforcement Learning

About

Recent text-guided image editing (TIE) models have achieved remarkable progress, however, many edited results still suffer from artifacts, unintended modifications, and suboptimal aesthetics. Although several benchmarks and evaluation methods have been proposed, most existing approaches rely on scalar scores and lack interpretability. This limitation largely stems from the absence of high-quality interpretation datasets for TIE and effective reward models to train interpretable evaluators. To address these challenges, we introduce ReasonEdit-22K, the first dataset that combines 22K edited images with 113K Chain-of-Thought (CoT) samples, along with 1.3M human judgments assessing these interpretations in terms of logicality, accuracy, and usefulness. Building upon this dataset, we propose RE-Reward, a multimodal large language model (MLLM)-based reward model designed to provide human-aligned feedback for evaluating interpretable reasoning in image editing. Furthermore, we develop ReasonEdit, which is trained using reward signals derived from RE-Reward and the Group Relative Policy Optimization (GRPO) algorithm to learn an interpretable evaluation model. Extensive experiments demonstrate that ReasonEdit achieves superior alignment with human preferences and exhibits strong generalization across public benchmarks. In addition, it is capable of generating high-quality interpretable evaluation text, enabling more transparent and trustworthy assessment for image editing. The code is available at https://github.com/IntMeGroup/ReasonEdit.

Honghua Chen, Zitong Xu, Huiyu Duan, Xinyun Zhang, Xiongkuo Min, Guangtao Zhai• 2026

Related benchmarks

Task	Dataset	Result
Visual Quality Evaluation	EBench-18K	SRCC0.9466	44
Instruction Alignment Evaluation	EBench-18K	SRCC92.09	22
Image editing point-wise evaluation	ImagenHub	Spearman Rank Correlation0.7566	22
Reward Modeling (Accuracy)	ReasonEdit-Reward 113K 1.0 (test)	SRCC0.9323	16
Reward Modeling (Logicality)	ReasonEdit-Reward 113K 1.0 (test)	SRCC0.8958	16
Reward Modeling (Usefulness)	ReasonEdit-Reward 113K 1.0 (test)	SRCC0.9428	16
Image Editing Quality Assessment	GenAI-Bench	Accuracy83.9	10
Image Editing Quality Assessment	AROURA-Bench	Accuracy72.22	10
Image Editing Quality Assessment	EditScore-Bench	Accuracy0.7848	10
Image Editing Quality Assessment	IEQA	SRCC0.583	10

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord