MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model

About

This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. Despite achieving reasonable results, these approaches face challenges in maintaining temporal consistency throughout the animation due to the lack of temporal modeling and poor preservation of reference identity. In this work, we introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity. To achieve this, we first develop a video diffusion model to encode temporal information. Second, to maintain the appearance coherence across frames, we introduce a novel appearance encoder to retain the intricate details of the reference image. Leveraging these two innovations, we further employ a simple video fusion technique to encourage smooth transitions for long video animation. Empirical results demonstrate the superiority of our method over baseline approaches on two benchmarks. Notably, our approach outperforms the strongest baseline by over 38% in terms of video fidelity on the challenging TikTok dancing dataset. Code and model will be made available.

Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou• 2023

Related benchmarks

Task	Dataset	Result
Fashion video synthesis	UBC fashion video dataset (test)	SSIM0.602	18
Human Dance Generation	Tiktok (test)	SSIM0.714	17
Character Image Animation	Follow-Your-Pose V2	LPIPS0.183	15
Human Image Animation	TikTok	FVD179.1	15
Human Image Animation	Tiktok (test)	FVD876	15
Video Generation	Tiktok (test)	SSIM0.748	11
Keypoint-based Portrait Animation	Portrait Animation	CPBD0.3852	10
Human Image Animation	TikTok (sequences 335 to 340)	FID-VID16.2	10
Human Image Animation	Unseen100	L1 Loss3.23e+4	9
Human Reconstruction	THuman 2.0 (test)	PSNR14.501	9

Showing 10 of 31 rows

Other info

Follow for update

@wizwand_team Discord