Stacked Hourglass Networks for Human Pose Estimation
About
This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a "stacked hourglass" network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods.
Alejandro Newell, Kaiyu Yang, Jia Deng• 2016
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human Pose Estimation | COCO (test-dev) | AP66.9 | 408 | |
| 2D Human Pose Estimation | COCO 2017 (val) | AP66.9 | 386 | |
| Pose Estimation | COCO (val) | -- | 319 | |
| Human Pose Estimation | MPII (test) | Shoulder PCK96.3 | 314 | |
| Human Pose Estimation | LSP (test) | Head Accuracy98.2 | 102 | |
| Facial Landmark Detection | AFLW Full | NME0.0195 | 101 | |
| Multi-person Pose Estimation | COCO 2017 (test-dev) | AP56.6 | 99 | |
| 2D Human Pose Estimation | MPII (val) | Head97.44 | 61 | |
| Animal Pose Estimation | AP-10K (test) | mAP72.9 | 55 | |
| Facial Landmark Detection | 300-W public Challenging inter-pupil normalization (test) | NME7.56 | 46 |
Showing 10 of 30 rows