DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

About

Animating a still image offers an engaging visual experience. Traditional image animation techniques mainly focus on animating natural scenes with stochastic dynamics (e.g. clouds and fluid) or domain-specific motions (e.g. human hair or body motions), and thus limits their applicability to more general visual content. To overcome this limitation, we explore the synthesis of dynamic content for open-domain images, converting them into animated videos. The key idea is to utilize the motion prior of text-to-video diffusion models by incorporating the image into the generative process as guidance. Given an image, we first project it into a text-aligned rich context representation space using a query transformer, which facilitates the video model to digest the image content in a compatible fashion. However, some visual details still struggle to be preserved in the resultant videos. To supplement with more precise image information, we further feed the full image to the diffusion model by concatenating it with the initial noises. Experimental results show that our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image. Comparative evaluation demonstrates the notable superiority of our approach over existing competitors.

Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Xintao Wang, Tien-Tsin Wong, Ying Shan• 2023

Related benchmarks

Task	Dataset	Result
Image-to-Video Generation	VBench	Motion Smoothness0.9746	46
3D Scene Generation	WorldScore	Camera Control25.15	33
Video Frame Interpolation	MultiInterpBench	FID42.3	24
Image-to-Video	VBench I2V	Quality Overall Score80.46	16
Video Frame Interpolation	BS-ERGB 3 skips	PSNR15.47	15
Wildfire Spread Prediction (Mask Video)	Multi-Region Datasets Seen Region	AUPRC0.72	11
Wildfire Spread Prediction (Mask Video)	Multi-Region Datasets (Unseen Region)	AUPRC67	11
Fire mask prediction	Single-region wildfire dataset (test)	AUPRC74	11
Human Image Animation	Curated (test)	CPBD0.8333	9
Motion Generation	Kubric	FVMD4.14e+4	9

Showing 10 of 45 rows

Other info

Follow for update

@wizwand_team Discord