Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

About

Instruction-based video editing has witnessed rapid progress, yet current methods often struggle with precise visual control, as natural language is inherently limited in describing complex visual nuances. Although reference-guided editing offers a robust solution, its potential is currently bottlenecked by the scarcity of high-quality paired training data. To bridge this gap, we introduce a scalable data generation pipeline that transforms existing video editing pairs into high-fidelity training quadruplets, leveraging image generative models to create synthesized reference scaffolds. Using this pipeline, we construct RefVIE, a large-scale dataset tailored for instruction-reference-following tasks, and establish RefVIE-Bench for comprehensive evaluation. Furthermore, we propose a unified editing architecture, Kiwi-Edit, that synergizes learnable queries and latent visual features for reference semantic guidance. Our model achieves significant gains in instruction following and reference fidelity via a progressive multi-stage training curriculum. Extensive experiments demonstrate that our data and architecture establish a new state-of-the-art in controllable video editing. All datasets, models, and code is released at https://github.com/showlab/Kiwi-Edit.

Yiqi Lin, Guoqiang Liang, Ziyun Zeng, Zechen Bai, Yanzhe Chen, Mike Zheng Shou• 2026

Related benchmarks

Task	Dataset	Result
Video Editing	OpenVE-Bench	Overall Score3.02	39
Video Generation	VBench	Motion Smoothness99.3	37
Instruction-Guided Video Editing	OpenVE-Bench	Overall Score3.12	29
Video Editing	OpenVE-Bench (test)	Overall Score2.51	28
Background Replacement	OpenVE-Bench	Overall Score2.58	10
Reference-guided Video Editing	RefVIE-Bench (test)	Identity Score3.98	9
Video Editing	VLM benchmark	IA Score19.15	8
Background Replacement	Sparkle-Bench	Overall Score2.54	8
Video Editing	EditVerse-Bench 120-case source-video	Overall Score7	8
Video Editing	VLM-based Video Editing Evaluation	Background Replacement Score46.8	8

Showing 10 of 21 rows

Other info

GitHub

Follow for update

@wizwand_team Discord