RDM: Recurrent Diffusion Model for Human Motion Generation
About
Human motion generation is a challenging task due to its high dimensionality and the difficulty of generating fine-grained motions. Diffusion methods have been proposed due to their high sample quality and expressiveness. Early approaches treat the entire sequence as a whole, which is computationally expensive and restricts sequence length. In contrast, autoregressive diffusion models generate longer sequences. However, their reliance on fully denoising previous frames complicates training and inference. Consequently, we propose \textit{RDM}, a new recurrent diffusion formulation similar to Recurrent Neural Networks (RNNs).RDMs explicitly condition diffusion processes on preceding noisy frames, avoiding the cost of full denoising. Nonetheless, maintaining its probabilistic nature is non-trivial. Therefore, we employ Normalizing Flows to model recurrent connections. Our evaluations demonstrate RDM's effectiveness: it achieves comparable performance to autoregressive baselines and generates long sequences that remain aligned with the text. RDM also skips diffusion steps during inference, significantly reducing computational cost.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| text-to-motion mapping | HumanML3D (test) | FID0.07 | 283 | |
| Text-to-motion generation | KIT-ML (test) | FID0.299 | 189 |