RDM: Recurrent Diffusion Model for Human Motion Generation

About

Human motion generation is a challenging task due to its high dimensionality and the difficulty of generating fine-grained motions. Diffusion methods have been proposed due to their high sample quality and expressiveness. Early approaches treat the entire sequence as a whole, which is computationally expensive and restricts sequence length. In contrast, autoregressive diffusion models generate longer sequences. However, their reliance on fully denoising previous frames complicates training and inference. Consequently, we propose \textit{RDM}, a new recurrent diffusion formulation similar to Recurrent Neural Networks (RNNs).RDMs explicitly condition diffusion processes on preceding noisy frames, avoiding the cost of full denoising. Nonetheless, maintaining its probabilistic nature is non-trivial. Therefore, we employ Normalizing Flows to model recurrent connections. Our evaluations demonstrate RDM's effectiveness: it achieves comparable performance to autoregressive baselines and generates long sequences that remain aligned with the text. RDM also skips diffusion steps during inference, significantly reducing computational cost.

Mirgahney Mohamed, Harry Jake Cunningham, Marc P. Deisenroth, Lourdes Agapito• 2024

Related benchmarks

Task	Dataset	Result	Rank
text-to-motion mapping	HumanML3D (test)	FID0.07		283
Text-to-motion generation	KIT-ML (test)	FID0.299		206

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord