MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

About

In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects, such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a controllable video generation framework, dubbed MimicMotion, which can generate high-quality videos of arbitrary length mimicking specific motion guidance. Compared with previous methods, our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which significantly reduces image distortion. Lastly, for generating long and smooth videos, we propose a progressive latent fusion strategy. By this means, we can produce videos of arbitrary length with acceptable resource consumption. With extensive experiments and user studies, MimicMotion demonstrates significant improvements over previous approaches in various aspects. Detailed results and comparisons are available on our project page: https://tencent.github.io/MimicMotion .

Yuang Zhang, Jiaxi Gu, Li-Wen Wang, Han Wang, Junqi Cheng, Yuefeng Zhu, Fangyuan Zou• 2024

Related benchmarks

Task	Dataset	Result
Human Image Animation	RealisDance (val)	Subject Consistency92.21	27
Image-to-Video Generation	VBench I2V	Background Consistency88.47	24
Human Image Animation	TikTok	FVD326.6	15
Character Image Animation	Follow-Your-Pose V2	LPIPS0.292	15
Human-Object Interaction Video Generation	AnchorCrafter (test)	MS (Motion Score)99.18	14
Audio-driven half-body human video generation	EMTD 1.0 (evaluation set)	FID53.47	14
Video Generation	Tiktok (test)	SSIM0.88	11
Human Image Animation	TikTok (sequences 335 to 340)	FID-VID9.3	10
Motion-Controlled Video Generation	RealisDance (val)	Average Score82.27	10
Pose-guided video generation	Fashion (test)	PSNR23.8	9

Showing 10 of 53 rows

Other info

Follow for update

@wizwand_team Discord