Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

About

Instruction-based image editing has achieved remarkable progress; however, models solely trained via supervised fine-tuning often overfit to annotated patterns, hindering their ability to explore and generalize beyond training distributions. To this end, we introduce Edit-R1, a novel post-training framework for instruction-based image editing based on policy optimization. Specifically, we utilize Diffusion Negative-aware Finetuning (DiffusionNFT), a likelihood-free policy optimization method consistent with the flow matching forward process, thereby enabling the use of higher-order samplers and more efficient training. Another key challenge here is the absence of a universal reward model, resulting from the diverse nature of editing instructions and tasks. To bridge this gap, we employ a Multimodal Large Language Model (MLLM) as a unified, training-free reward model, leveraging its output logits to provide fine-grained feedback. Furthermore, we carefully design a low-variance group filtering mechanism to reduce MLLM scoring noise and stabilize optimization. \texttt{UniWorld-V2}, trained with this framework, achieves \textbf{state-of-the-art} results on the ImgEdit and GEdit-Bench benchmarks, scoring 4.49 and 7.83, respectively. Crucially, our framework is model-agnostic, delivering substantial performance gains when applied to diverse base models like Qwen-Image-Edit and FLUX-Kontext, demonstrating its wide applicability. Code and models are publicly available to support further research.

Zongjian Li, Zheyuan Liu, Qihui Zhang, Bin Lin, Feize Wu, Shenghai Yuan, Zhiyuan Yan, Yang Ye, Wangbo Yu, Yuwei Niu, Shaodong Wang, Xinhua Cheng, Li Yuan• 2025

Related benchmarks

TaskDatasetResultRank
Image EditingImgEdit-Bench
Overall Score4.49
191
Image EditingGEdit-Bench
Semantic Consistency8.36
92
Image EditingGEdit-Bench English
G_O (Overall Quality)7.83
84
Image EditingKRIS-Bench
Factual Knowledge Score0.6172
74
CombinedMultilingual Benchmark
IA Score4.62
34
Image EditingImgEdit
Overall Score4.48
22
Multi-image ReasoningOmniContext
Single Scene Char Score8.45
20
Image EditingWeEdit Bilingual Benchmark 1.0 (test)
Add IA Score5.15
17
ReasoningMultilingual Benchmark
IA Score1.13
17
StyleMultilingual Benchmark
IA5.63
17
Showing 10 of 22 rows

Other info

Follow for update