Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit Feedback

About

Instruction-based image editing has achieved remarkable progress; however, models solely trained via supervised fine-tuning often overfit to annotated patterns, hindering their ability to explore and generalize beyond training distributions. To this end, we introduce Edit-R1, a novel post-training framework for instruction-based image editing based on policy optimization. Specifically, we utilize Diffusion Negative-aware Finetuning (DiffusionNFT), a likelihood-free policy optimization method consistent with the flow matching forward process, thereby enabling the use of higher-order samplers and more efficient training. Another key challenge here is the absence of a universal reward model, resulting from the diverse nature of editing instructions and tasks. To bridge this gap, we employ a Multimodal Large Language Model (MLLM) as a unified, training-free reward model, leveraging its output logits to provide fine-grained feedback. Furthermore, we carefully design a low-variance group filtering mechanism to reduce MLLM scoring noise and stabilize optimization. \texttt{UniWorld-V2}, trained with this framework, achieves \textbf{state-of-the-art} results on the ImgEdit and GEdit-Bench benchmarks, scoring 4.49 and 7.83, respectively. Crucially, our framework is model-agnostic, delivering substantial performance gains when applied to diverse base models like Qwen-Image-Edit and FLUX-Kontext, demonstrating its wide applicability. Code and models are publicly available to support further research.

Zongjian Li, Zheyuan Liu, Qihui Zhang, Bin Lin, Feize Wu, Shenghai Yuan, Zhiyuan Yan, Yang Ye, Wangbo Yu, Yuwei Niu, Shaodong Wang, Xinhua Cheng, Li Yuan• 2025

Related benchmarks

Task	Dataset	Result
Image Editing	ImgEdit-Bench	Overall Score4.49	256
World Knowledge Image Generation	WISE	Overall Score58	110
Image Editing	GEdit-Bench	Semantic Consistency8.36	102
Image Editing	KRIS-Bench	Overall Score55.98	98
Image Editing	GEdit-Bench English	G_O (Overall Quality)7.83	94
Image Editing	ImgEdit	ImgEdit4.48	62
Combined	Multilingual Benchmark	IA Score4.62	34
Image Editing	ImgEdit	Overall Score4.48	32
Instruction-based Image Editing	GEdit-Bench-EN (test)	G_SC Score8.39	22
Multi-image Reasoning	OmniContext	Single Scene Char Score8.45	20

Showing 10 of 31 rows

Other info

Follow for update

@wizwand_team Discord