Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models

About

The remarkable generative capabilities of diffusion models have motivated extensive research in both image and video editing. Compared to video editing which faces additional challenges in the time dimension, image editing has witnessed the development of more diverse, high-quality approaches and more capable software like Photoshop. In light of this gap, we introduce a novel and generic solution that extends the applicability of image editing tools to videos by propagating edits from a single frame to the entire video using a pre-trained image-to-video model. Our method, dubbed I2VEdit, adaptively preserves the visual and motion integrity of the source video depending on the extent of the edits, effectively handling global edits, local edits, and moderate shape changes, which existing methods cannot fully achieve. At the core of our method are two main processes: Coarse Motion Extraction to align basic motion patterns with the original video, and Appearance Refinement for precise adjustments using fine-grained attention matching. We also incorporate a skip-interval strategy to mitigate quality degradation from auto-regressive generation across multiple video clips. Experimental results demonstrate our framework's superior performance in fine-grained video editing, proving its capability to produce high-quality, temporally consistent outputs.

Wenqi Ouyang, Yi Dong, Lei Yang, Jianlou Si, Xingang Pan• 2024

Related benchmarks

TaskDatasetResultRank
Sketch-based video editingSketch-based video editing dataset (test)
LPIPS10.98
9
Video EditingVBench
MS99.1
8
First-frame-guided video editingI2V-Edit-Benchmark
CLIP Score0.909
7
Multi-object motion transferCustom multi-object motion transfer 200 sequences (test)
AC (Automatic)91.5
6
Video EditingUser Study
Editing Consistency Score78.5
6
Trajectory-guided Video Editing40 videos (test)
SSIM (Background)0.17
6
Egocentric-to-Exocentric Video TranslationEgoExo Real-World Scenarios 8K
LPIPS0.6892
5
Exocentric-to-Egocentric Video TranslationEgoExo Real-World Scenarios 8K
LPIPS0.6945
5
Egocentric-to-Exocentric Video TranslationEgoExo Synthetic Scenarios 8K
LPIPS0.6385
5
Exocentric-to-Egocentric Video TranslationEgoExo Synthetic Scenarios 8K
LPIPS0.6272
5
Showing 10 of 10 rows

Other info

Follow for update