Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance

About

Instruction-based video editing has witnessed rapid progress, yet current methods often struggle with precise visual control, as natural language is inherently limited in describing complex visual nuances. Although reference-guided editing offers a robust solution, its potential is currently bottlenecked by the scarcity of high-quality paired training data. To bridge this gap, we introduce a scalable data generation pipeline that transforms existing video editing pairs into high-fidelity training quadruplets, leveraging image generative models to create synthesized reference scaffolds. Using this pipeline, we construct RefVIE, a large-scale dataset tailored for instruction-reference-following tasks, and establish RefVIE-Bench for comprehensive evaluation. Furthermore, we propose a unified editing architecture, Kiwi-Edit, that synergizes learnable queries and latent visual features for reference semantic guidance. Our model achieves significant gains in instruction following and reference fidelity via a progressive multi-stage training curriculum. Extensive experiments demonstrate that our data and architecture establish a new state-of-the-art in controllable video editing. All datasets, models, and code is released at https://github.com/showlab/Kiwi-Edit.

Yiqi Lin, Guoqiang Liang, Ziyun Zeng, Zechen Bai, Yanzhe Chen, Mike Zheng Shou• 2026

Related benchmarks

TaskDatasetResultRank
Video EditingOpenVE-Bench
Overall Score3.02
39
Video GenerationVBench
Motion Smoothness99.3
37
Video EditingOpenVE-Bench (test)
Overall Score2.51
28
Background ReplacementOpenVE-Bench
Overall Score2.58
10
Video EditingVLM benchmark
IA Score19.15
8
Background ReplacementSparkle-Bench
Overall Score2.54
8
Video EditingEditVerse-Bench 120-case source-video
Overall Score7
8
Video EditingVLM-based Video Editing Evaluation
Background Replacement Score46.8
8
Object ReplacementOcclusion-Bench
Instance FID (Frame)21.68
6
Object AdditionOcclusion-Bench
Instance FID (Frame)20.9
6
Showing 10 of 16 rows

Other info

GitHub

Follow for update