Action2Motion: Conditioned Generation of 3D Human Motions
About
Action recognition is a relatively established task, where givenan input sequence of human motion, the goal is to predict its ac-tion category. This paper, on the other hand, considers a relativelynew problem, which could be thought of as an inverse of actionrecognition: given a prescribed action type, we aim to generateplausible human motion sequences in 3D. Importantly, the set ofgenerated motions are expected to maintain itsdiversityto be ableto explore the entire action-conditioned motion space; meanwhile,each sampled sequence faithfully resembles anaturalhuman bodyarticulation dynamics. Motivated by these objectives, we followthe physics law of human kinematics by adopting the Lie Algebratheory to represent thenaturalhuman motions; we also propose atemporal Variational Auto-Encoder (VAE) that encourages adiversesampling of the motion space. A new 3D human motion dataset, HumanAct12, is also constructed. Empirical experiments overthree distinct human motion datasets (including ours) demonstratethe effectiveness of our approach.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Human Motion Generation | HumanAct12 | FID0.21 | 36 | |
| 3D Human Motion Generation | UESTC | Accuracy91.07 | 14 | |
| Action-conditioned Motion Generation | CMU MOCAP | Accuracy76.33 | 10 | |
| Unconditional human motion synthesis | HumanAct12 | FID49.76 | 7 | |
| Future motion prediction | NTU RGB-D (test) | ADEw0.78 | 5 | |
| Future motion prediction | BABEL (test) | ADEw1.25 | 5 | |
| Future motion prediction | GRAB (test) | ADEw1.92 | 5 | |
| Human Motion Prediction | GRAB (test) | Accuracy70.6 | 5 | |
| Human Motion Prediction | NTU RGB-D (test) | Accuracy66.3 | 5 | |
| Human Motion Prediction | BABEL (test) | Accuracy14.8 | 5 |