Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

About

Instruction-based image editing has achieved remarkable progress; however, models solely trained via supervised fine-tuning often overfit to annotated patterns, hindering their ability to explore and generalize beyond training distributions. To this end, we introduce Edit-R1, a novel post-training framework for instruction-based image editing based on policy optimization. Specifically, we utilize Diffusion Negative-aware Finetuning (DiffusionNFT), a likelihood-free policy optimization method consistent with the flow matching forward process, thereby enabling the use of higher-order samplers and more efficient training. Another key challenge here is the absence of a universal reward model, resulting from the diverse nature of editing instructions and tasks. To bridge this gap, we employ a Multimodal Large Language Model (MLLM) as a unified, training-free reward model, leveraging its output logits to provide fine-grained feedback. Furthermore, we carefully design a low-variance group filtering mechanism to reduce MLLM scoring noise and stabilize optimization. \texttt{UniWorld-V2}, trained with this framework, achieves \textbf{state-of-the-art} results on the ImgEdit and GEdit-Bench benchmarks, scoring 4.49 and 7.83, respectively. Crucially, our framework is model-agnostic, delivering substantial performance gains when applied to diverse base models like Qwen-Image-Edit and FLUX-Kontext, demonstrating its wide applicability. Code and models are publicly available to support further research.

Zongjian Li, Zheyuan Liu, Qihui Zhang, Bin Lin, Feize Wu, Shenghai Yuan, Zhiyuan Yan, Yang Ye, Wangbo Yu, Yuwei Niu, Shaodong Wang, Xinhua Cheng, Li Yuan• 2025

Related benchmarks

TaskDatasetResultRank
Image EditingImgEdit-Bench
Overall Score4.49
132
Image EditingGEdit-Bench English
G_O (Overall Quality)7.83
73
Image EditingKRIS-Bench
Factual Knowledge Score0.6172
65
Image EditingGEdit-Bench
Semantic Consistency8.36
46
Multi-image ReasoningOmniContext
Single Scene Char Score8.45
20
Subject-driven image generationSconeEval
Composition Single COM8.41
11
Poster CreationPosterOmni-Bench en
Extending Score4.25
10
Global Image EditingGEdit-Bench
Style Score6.935
7
Poster CreationPosterOmni-Bench cn
Content Extension Score4.22
7
Showing 9 of 9 rows

Other info

Follow for update