Convolutional Pose Machines
About
Pose Machines provide a sequential prediction framework for learning rich implicit spatial models. In this work we show a systematic design for how convolutional networks can be incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation. The contribution of this paper is to implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation. We achieve this by designing a sequential architecture composed of convolutional networks that directly operate on belief maps from previous stages, producing increasingly refined estimates for part locations, without the need for explicit graphical model-style inference. Our approach addresses the characteristic difficulty of vanishing gradients during training by providing a natural learning objective function that enforces intermediate supervision, thereby replenishing back-propagated gradients and conditioning the learning procedure. We demonstrate state-of-the-art performance and outperform competing methods on standard benchmarks including the MPII, LSP, and FLIC datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human Pose Estimation | MPII (test) | Shoulder PCK95 | 314 | |
| Human Pose Estimation | LSP (test) | Head Accuracy97.8 | 102 | |
| 2D Human Pose Estimation | MPII (val) | Head96.2 | 61 | |
| Human Pose Estimation | J-HMDB sub | Head Accuracy98.4 | 49 | |
| 3D Pose Estimation | Total Capture (test) | Mean MPJPE99 | 42 | |
| Human Pose Estimation | MPII | Head Accuracy97.8 | 32 | |
| Pose Estimation | Penn Action Dataset (test) | Head98.6 | 19 | |
| Human Pose Estimation | LSP PC annotations (test) | Torso Accuracy0.98 | 16 | |
| Multi-person Pose Estimation | Multi-Person PoseTrack | Head Accuracy0.488 | 15 | |
| Human Pose Estimation | MPII pose 03/15/2018 (full) | Head Accuracy97.8 | 11 |