3D Human Pose Estimation with 2D Marginal Heatmaps
About
Automatically determining three-dimensional human pose from monocular RGB image data is a challenging problem. The two-dimensional nature of the input results in intrinsic ambiguities which make inferring depth particularly difficult. Recently, researchers have demonstrated that the flexible statistical modelling capabilities of deep neural networks are sufficient to make such inferences with reasonable accuracy. However, many of these models use coordinate output techniques which are memory-intensive, not differentiable, and/or do not spatially generalise well. We propose improvements to 3D coordinate prediction which avoid the aforementioned undesirable traits by predicting 2D marginal heatmaps under an augmented soft-argmax scheme. Our resulting model, MargiPose, produces visually coherent heatmaps whilst maintaining differentiability. We are also able to achieve state-of-the-art accuracy on publicly available 3D human pose estimation data.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Human Pose Estimation | Human3.6M (test) | MPJPE (Average)55.4 | 547 | |
| 3D Human Pose Estimation | Human3.6M (subjects 9 and 11) | -- | 180 | |
| 3D Human Pose Estimation | H3.6M Protocol 1 (subjects 9 and 11) | Avg Error57 | 18 | |
| 3D Pose Estimation | MPI-INF-3DHP single-person capture | 3DPCK87.6 | 13 | |
| 3D Human Pose Estimation | H3.6M Protocol 2 (subject 11) | P-MPJPE40.4 | 11 | |
| Human Pose Estimation | Human3.6m synthetic event version (cross-subject test) | MPJPE57 | 8 | |
| 3D Human Pose Estimation | MPI-INF-3DHP Universal, height-normalized skeletons 1.0/2.0 (test) | -- | 8 | |
| 3D Pose Estimation | MPI-INF-3DHP uncorrected labels (test) | PCK85.4 | 5 |