Human Motion Diffusion as a Generative Prior

About

Recent work has demonstrated the significant potential of denoising diffusion models for generating human motion, including text-to-motion capabilities. However, these methods are restricted by the paucity of annotated motion data, a focus on single-person motions, and a lack of detailed control. In this paper, we introduce three forms of composition based on diffusion priors: sequential, parallel, and model composition. Using sequential composition, we tackle the challenge of long sequence generation. We introduce DoubleTake, an inference-time method with which we generate long animations consisting of sequences of prompted intervals and their transitions, using a prior trained only for short clips. Using parallel composition, we show promising steps toward two-person generation. Beginning with two fixed priors as well as a few two-person training examples, we learn a slim communication block, ComMDM, to coordinate interaction between the two resulting motions. Lastly, using model composition, we first train individual priors to complete motions that realize a prescribed motion for a given joint. We then introduce DiffusionBlending, an interpolation mechanism to effectively blend several such models to enable flexible and efficient fine-grained joint and trajectory-level control and editing. We evaluate the composition methods using an off-the-shelf motion diffusion model, and further compare the results to dedicated models trained for these specific tasks.

Yonatan Shafir, Guy Tevet, Roy Kapon, Amit H. Bermano• 2023

Related benchmarks

Task	Dataset	Result
Motion Control	HumanML3D (test)	Average Error0.4417	82
Interactive Motion Synthesis	InterHuman (test)	R Precision (Top 1)22.3	37
Text-to-motion generation	HumanML3D 19 (test)	FID0.6	37
text-conditioned human interaction generation	InterHuman (test)	R Precision (Top 3)46.6	36
Human Motion Generation	HumanML3D (test)	FID0.475	27
Human-Object Interaction Generation	OMOMO (test)	FID0.329	24
Human-human interaction motion generation	InterHuman	FID7.069	23
Human-Object Interaction Generation	BEHAVE (test)	FID0.328	21
Text-to-Interaction Motion Generation	InterHuman (test)	Interaction Alignment0.577	19
Trajectory-controlled human motion generation	KIT-ML (test)	FID0.851	19

Showing 10 of 53 rows

Other info

Code

Follow for update

@wizwand_team Discord