Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

About

The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we present a framework that harnesses the power of a text-to-image diffusion model for the task of text-driven video editing. Specifically, given a source video and a target text-prompt, our method generates a high-quality video that adheres to the target text, while preserving the spatial layout and motion of the input video. Our method is based on a key observation that consistency in the edited video can be obtained by enforcing consistency in the diffusion feature space. We achieve this by explicitly propagating diffusion features based on inter-frame correspondences, readily available in the model. Thus, our framework does not require any training or fine-tuning, and can work in conjunction with any off-the-shelf text-to-image editing method. We demonstrate state-of-the-art editing results on a variety of real-world videos. Webpage: https://diffusion-tokenflow.github.io/

Michal Geyer, Omer Bar-Tal, Shai Bagon, Tali Dekel• 2023

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationGenEval
Overall Score55
467
Visual ReasoningMM-Vet
Score40.7
34
Video EditingDAVIS (first 33 frames)
Background MSE1.17e+3
14
Video Object RetexturingPexels video dataset (test)
Background MSE889.9
14
Video EditingNRVBench V1 (full)
Distortion (x10^3)111.9
14
Multimodal UnderstandingMMBench v1.1 (dev)
MMBench Score68.9
14
Multi-discipline ReasoningMMMU standard (test)
MMMU Score38.7
14
Video EditingEditVerseBench Appearance (test)
Pick Score20.02
12
Sketch-based video editingSketch-based video editing dataset (test)
LPIPS12.92
9
Instructional Video EditingFiVE (test)
FiVE YN19.36
9
Showing 10 of 32 rows

Other info

Follow for update