EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
About
Recently, we have witnessed great progress in image editing with natural language instructions. Several closed-source models like GPT-Image-1, Seedream, and Google-Nano-Banana have shown highly promising progress. However, the open-source models are still lagging. The main bottleneck is the lack of a reliable reward model to scale up high-quality synthetic training data. To address this critical bottleneck, we built EditReward, trained with our new large-scale human preference dataset, meticulously annotated by trained experts following a rigorous protocol containing over 200K preference pairs. EditReward demonstrates superior alignment with human preferences in instruction-guided image editing tasks. Experiments show that EditReward achieves state-of-the-art human correlation on established benchmarks such as GenAI-Bench, AURORA-Bench, ImagenHub, and our new EditReward-Bench, outperforming a wide range of VLM-as-judge models. Furthermore, we use EditReward to select a high-quality subset from the existing noisy ShareGPT-4o-Image dataset. We train Step1X-Edit on the selected subset, which shows significant improvement over training on the full set. This demonstrates EditReward's ability to serve as a reward model to scale up high-quality training data for image editing. Furthermore, its strong alignment suggests potential for advanced applications like reinforcement learning-based post-training and test-time scaling of image editing models. EditReward with its training dataset will be released to help the community build more high-quality image editing training datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Editing | GEdit-Bench-EN (full) | G-Score (O)7.086 | 66 | |
| Image Editing | GEdit-Bench-CN (Full set) | G_SC7.658 | 29 | |
| Image Editing | GEdit-Bench-EN Intersection subset v1.0 | G_SC7.895 | 19 | |
| Reward Modeling | EditReward-Bench | PF83.2 | 17 | |
| Instruction-guided image editing preference prediction | GenAI-Bench | Accuracy65.72 | 12 | |
| Instruction-guided image editing preference prediction | AURORA-Bench | Accuracy63.62 | 12 | |
| Image editing point-wise evaluation | ImagenHub | Spearman Rank Correlation36.18 | 12 | |
| Multi-way preference ranking | EditReward-Bench | Preference Score (K=2)56.99 | 11 | |
| Image Editing | GEdit-Bench-CN-I (intersection) | G_SC Score7.757 | 10 | |
| Reward Modeling | MMRB 2 | Single-turn Score67.2 | 9 |