Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis

About

In this paper, we introduce Fairy, a minimalist yet robust adaptation of image-editing diffusion models, enhancing them for video editing applications. Our approach centers on the concept of anchor-based cross-frame attention, a mechanism that implicitly propagates diffusion features across frames, ensuring superior temporal coherence and high-fidelity synthesis. Fairy not only addresses limitations of previous models, including memory and processing speed. It also improves temporal consistency through a unique data augmentation strategy. This strategy renders the model equivariant to affine transformations in both source and target images. Remarkably efficient, Fairy generates 120-frame 512x384 videos (4-second duration at 30 FPS) in just 14 seconds, outpacing prior works by at least 44x. A comprehensive user study, involving 1000 generated samples, confirms that our approach delivers superior quality, decisively outperforming established methods.

Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang, Yichen Jia, Kapil Krishnakumar, Tong Xiao, Feng Liang, Licheng Yu, Peter Vajda• 2023

Related benchmarks

TaskDatasetResultRank
Video EditingTGVE benchmark
Pick Score19.8
11
Video EditingTGVE (test)
ViCLIPout0.208
9
Video EditingTGVE+ (test)
ViCLIPout0.197
9
Video EditingSenorita (test)
DVS Score0.4
8
Text-guided Video EditingVideo Editing 4s, 30 FPS, 512p x 384p (test)
Latency (s)13.8
3
Showing 5 of 5 rows

Other info

Code

Follow for update