VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models

About

Text-to-video diffusion models have advanced video generation significantly. However, customizing these models to generate videos with tailored motions presents a substantial challenge. In specific, they encounter hurdles in (a) accurately reproducing motion from a target video, and (b) creating diverse visual variations. For example, straightforward extensions of static image customization methods to video often lead to intricate entanglements of appearance and motion data. To tackle this, here we present the Video Motion Customization (VMC) framework, a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. The diffusion process then preserves low-frequency motion trajectories while mitigating high-frequency motion-unrelated noise in image space. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts. Our codes, data and the project demo can be found at https://video-motion-customization.github.io

Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye• 2023

Related benchmarks

Task	Dataset	Result
Video Generation	VBench	--	126
Motion Customization	TGVE 76 videos (full)	Text Alignment25.53	12
Video Motion Transfer	DAVIS	Text Similarity23.88	8
Motion-Customized Video Generation	MotionBench	Motion Accuracy53.64	7
Video Motion Transfer	MotionBench	Text Similarity0.295	7
Motion Transfer	26 videos and 56 text-video pairs	Text Alignment (Automated)32.56	5
Text-guided Video Editing	24 videos (full)	Text Alignment (CLIP)0.801	5
Motion Customization	TGVE User Study (test)	Motion Fidelity3.8	4

Showing 8 of 8 rows

Other info

Code

Follow for update

@wizwand_team Discord