Guided Motion Diffusion for Controllable Human Motion Synthesis
About
Denoising diffusion models have shown great promise in human motion synthesis conditioned on natural language descriptions. However, integrating spatial constraints, such as pre-defined motion trajectories and obstacles, remains a challenge despite being essential for bridging the gap between isolated human motion and its surrounding environment. To address this issue, we propose Guided Motion Diffusion (GMD), a method that incorporates spatial constraints into the motion generation process. Specifically, we propose an effective feature projection scheme that manipulates motion representation to enhance the coherency between spatial information and local poses. Together with a new imputation formulation, the generated motion can reliably conform to spatial constraints such as global motion trajectories. Furthermore, given sparse spatial constraints (e.g. sparse keyframes), we introduce a new dense guidance approach to turn a sparse signal, which is susceptible to being ignored during the reverse steps, into denser signals to guide the generated motion to the given constraints. Our extensive experiments justify the development of GMD, which achieves a significant improvement over state-of-the-art methods in text-based motion generation while allowing control of the synthesized motions with spatial constraints.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Motion Completion | HumanML3D (test) | MPJPE25.7 | 40 | |
| Motion Control | HumanML3D (test) | Average Error14.39 | 34 | |
| Motion Editing | HumanML3D | Content Preservation0.79 | 12 | |
| Motion Generation | ADT curated P&R sequences | Prime Success20.69 | 8 | |
| Motion Generation | HD-EPIC curated P&R sequences | Prime Success29.23 | 8 | |
| Motion Generation | HOT3D curated P&R sequences | Prime Success34.04 | 8 | |
| Motion Generation | MoGaze curated P&R sequences | Prime Success2.01 | 8 | |
| Motion Generation | GIMO curated P&R sequences | Prime Success0.00e+0 | 8 | |
| Human Motion Generation | HUMANISE | Goal Distance1.13 | 6 | |
| Locomotion and scene-level interaction | TRUMANS | Cont.0.931 | 5 |