MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

About

Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this problem, we propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping. Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis. MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation. MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation. A qualitative analysis further demonstrates MotionDiffuse's controllability for comprehensive motion generation. Homepage: https://mingyuan-zhang.github.io/projects/MotionDiffuse.html

Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu• 2022

Related benchmarks

Task	Dataset	Result
Text-to-motion generation	HumanML3D (test)	FID0.287	576
text-to-motion mapping	HumanML3D (test)	FID0.63	298
text-to-motion mapping	KIT-ML (test)	R Precision (Top 3)0.739	275
Text-to-motion generation	KIT-ML (test)	FID1.934	206
Text-to-motion generation	HumanML3D	FID0.63	96
Text-driven Motion Generation	HumanML3D (test)	R-Precision@149.1	80
Text-to-Motion Synthesis	KIT-ML	R Precision Top 373.9	61
Text-to-motion	KIT-ML	R@373.9	44
Text-to-Motion Synthesis	HumanML3D	R-Precision (Top 1)64.5	43
Interactive Motion Synthesis	InterHuman (test)	R Precision (Top 1)40.1	37

Showing 10 of 55 rows

Other info

Code

Follow for update

@wizwand_team Discord