Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

About

Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable challenge. This paper proposes a novel zero-shot text-guided video-to-video translation framework to adapt image models to videos. The framework includes two parts: key frame translation and full video translation. The first part uses an adapted diffusion model to generate key frames, with hierarchical cross-frame constraints applied to enforce coherence in shapes, textures and colors. The second part propagates the key frames to other frames with temporal-aware patch matching and frame blending. Our framework achieves global style and local texture temporal consistency at a low cost (without re-training or optimization). The adaptation is compatible with existing image diffusion techniques, allowing our framework to take advantage of them, such as customizing a specific subject with LoRA, and introducing extra spatial guidance with ControlNet. Extensive experimental results demonstrate the effectiveness of our proposed framework over existing methods in rendering high-quality and temporally-coherent videos.

Shuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy• 2023

Related benchmarks

Task	Dataset	Result
Video Subject Swapping	Custom Video Subject Swapping dataset human-evaluated (test)	Subject Identity19	14
Video Enhancement	VC2	MS98.2	7
Video Enhancement	AD2	MS Score0.975	7
Zero-shot Text-guided Video Editing	Curated dataset 90-frames	CLIP-F90.63	7
Video Editing	HOSNeRF and NeuMan (test)	CLIPScore26.11	6
Video Stylization	TVSBench	CLIP-T20.62	6
Zero-shot Text-guided Video Editing	Curated dataset 8-frames	CLIP-F92.87	6
Video Subject Swapping	Shutterstock and DAVIS predefined concepts (test)	Text Alignment24.99	5
Zero-shot Text-guided Video Editing	Curated dataset 36-frames	CLIP-F8.97e+3	5
Text-to-Video Stylization	Pexels 50 videos (TV2V)	CLIP-T0.2272	4

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord