Deep High-Resolution Representation Learning for Human Pose Estimation
About
This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. The code and models have been publicly available at \url{https://github.com/leoxiaobin/deep-high-resolution-net.pytorch}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet-1K 1.0 (val) | Top-1 Accuracy79.5 | 1866 | |
| Image Classification | ImageNet-1k (val) | Top-1 Acc79.3 | 706 | |
| 3D Human Pose Estimation | Human3.6M (Protocol #1) | MPJPE (Avg.)53.2 | 440 | |
| Human Pose Estimation | COCO (test-dev) | AP77 | 408 | |
| 2D Human Pose Estimation | COCO 2017 (val) | AP77.4 | 386 | |
| Pose Estimation | COCO (val) | AP78.1 | 319 | |
| Human Pose Estimation | MPII (test) | Shoulder PCK96.9 | 314 | |
| Human Pose Estimation | COCO 2017 (test-dev) | AP77 | 180 | |
| Multi-person Pose Estimation | CrowdPose (test) | AP72.8 | 177 | |
| Facial Landmark Detection | 300-W (Fullset) | Mean Error (%)3.34 | 174 |