Flexible Motion In-betweening with Diffusion Models
About
Motion in-betweening, a fundamental task in character animation, consists of generating motion sequences that plausibly interpolate user-provided keyframe constraints. It has long been recognized as a labor-intensive and challenging process. We investigate the potential of diffusion models in generating diverse human motions guided by keyframes. Unlike previous inbetweening methods, we propose a simple unified model capable of generating precise and diverse motions that conform to a flexible range of user-specified spatial constraints, as well as text conditioning. To this end, we propose Conditional Motion Diffusion In-betweening (CondMDI) which allows for arbitrary dense-or-sparse keyframe placement and partial keyframe constraints while generating high-quality motions that are diverse and coherent with the given keyframes. We evaluate the performance of CondMDI on the text-conditioned HumanML3D dataset and demonstrate the versatility and efficacy of diffusion models for keyframe in-betweening. We further explore the use of guidance and imputation-based approaches for inference-time keyframing and compare CondMDI against these methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human Motion Editing | HumanML3D (test) | FID0.247 | 15 | |
| Motion In-betweening | 350k dataset | FPS1.93e+3 | 13 | |
| Temporal Inpainting (Backcasting) | HumanML3D | MPJPE2.72 | 10 | |
| Temporal Inpainting (Prediction) | HumanML3D | MPJPE10 | 10 | |
| Geometric-Constrained Motion Generation | Geometric-Constrained Generation | Trajectory Error70.46 | 8 | |
| Motion Generation | HumanML3D | MMD0.1101 | 7 | |
| Motion Generation | Bones-70k | MMD0.1062 | 7 | |
| Motion Generation | LaFAN1 G1 | MMD0.286 | 7 | |
| Motion In-betweening | Motion In-betweening (test) | L2P Error3.396 | 4 |