Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ReasonEdit: Towards Interpretable Image Editing Evaluation via Reinforcement Learning

About

Recent text-guided image editing (TIE) models have achieved remarkable progress, however, many edited results still suffer from artifacts, unintended modifications, and suboptimal aesthetics. Although several benchmarks and evaluation methods have been proposed, most existing approaches rely on scalar scores and lack interpretability. This limitation largely stems from the absence of high-quality interpretation datasets for TIE and effective reward models to train interpretable evaluators. To address these challenges, we introduce ReasonEdit-22K, the first dataset that combines 22K edited images with 113K Chain-of-Thought (CoT) samples, along with 1.3M human judgments assessing these interpretations in terms of logicality, accuracy, and usefulness. Building upon this dataset, we propose RE-Reward, a multimodal large language model (MLLM)-based reward model designed to provide human-aligned feedback for evaluating interpretable reasoning in image editing. Furthermore, we develop ReasonEdit, which is trained using reward signals derived from RE-Reward and the Group Relative Policy Optimization (GRPO) algorithm to learn an interpretable evaluation model. Extensive experiments demonstrate that ReasonEdit achieves superior alignment with human preferences and exhibits strong generalization across public benchmarks. In addition, it is capable of generating high-quality interpretable evaluation text, enabling more transparent and trustworthy assessment for image editing. The code is available at https://github.com/IntMeGroup/ReasonEdit.

Honghua Chen, Zitong Xu, Huiyu Duan, Xinyun Zhang, Xiongkuo Min, Guangtao Zhai• 2026

Related benchmarks

TaskDatasetResultRank
Visual Quality EvaluationEBench-18K
SRCC0.9466
44
Instruction Alignment EvaluationEBench-18K
SRCC92.09
22
Image editing point-wise evaluationImagenHub
Spearman Rank Correlation0.7566
22
Reward Modeling (Accuracy)ReasonEdit-Reward 113K 1.0 (test)
SRCC0.9323
16
Reward Modeling (Logicality)ReasonEdit-Reward 113K 1.0 (test)
SRCC0.8958
16
Reward Modeling (Usefulness)ReasonEdit-Reward 113K 1.0 (test)
SRCC0.9428
16
Image Editing Quality AssessmentGenAI-Bench
Accuracy83.9
10
Image Editing Quality AssessmentAROURA-Bench
Accuracy72.22
10
Image Editing Quality AssessmentEditScore-Bench
Accuracy0.7848
10
Image Editing Quality AssessmentIEQA
SRCC0.583
10
Showing 10 of 10 rows

Other info

Follow for update