Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

About

Currently, enhancing Unified Multimodal Models (UMMs) with image understanding, generation, and editing capabilities mainly relies on mixed multi-task training. Due to inherent task conflicts, such strategy requires complex multi-stage pipelines, massive data mixing, and balancing tricks, merely resulting in a performance trade-off rather than true mutual reinforcement. To break this paradigm, we propose Uni-Edit, an intelligent image editing task that serves as the first general task for UMM tuning. Unlike complex mixed pipelines, Uni-Edit improves performance across all three abilities at once using only one task, one training stage, and one dataset. Specifically, we first identify image editing as an inherently ideal general task, as it naturally demands both visual understanding and generation. However, existing editing data relies on simplistic instructions that severely underutilize a model's understanding capacity. To address this, we introduce the first automated and scalable data synthesis pipeline for intelligent editing, transforming diverse VQA data into complex and effective editing instructions with embedded questions and nested logic. This yields Uni-Edit-148k, pairing diverse reasoning-intensive instructions with high-quality edited images. Extensive experiments on BAGEL and Janus-Pro demonstrate that tuning solely on Uni-Edit achieves comprehensive enhancements across all three capabilities without any auxiliary operations.

Dian Zheng, Manyuan Zhang, Hongyu Li, Hongbo Liu, Kai Zou, Kaituo Feng, Hongsheng Li• 2026

Related benchmarks

Task	Dataset	Result
Image Generation	GenEval	Overall Score89	69
Image Understanding	MME	Score2.41e+3	66
Image Editing	ImgEdit	ImgEdit3.51	62
Multimodal Image Understanding	MMMU	Score54.2	26
Image Generation	WISE	Score75	23
Image Editing	GEdit	GEdit Score7.29	16
Image Understanding	MathVista	Accuracy74.3	14
Image Editing	RISE	RISE Score17.8	14
Image Understanding	MMVP	Score72.1	12
Image Understanding	MMBench EN 1.0	Score86	12

Showing 10 of 10 rows

Other info

GitHub

Follow for update

@wizwand_team Discord