ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild
About
This paper investigates the task of 2D whole-body human pose estimation, which aims to localize dense landmarks on the entire human body including body, feet, face, and hands. We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts. We further propose a neural architecture search framework, termed ZoomNAS, to promote both the accuracy and efficiency of whole-body pose estimation. ZoomNAS jointly searches the model architecture and the connections between different sub-modules, and automatically allocates computational complexity for searched sub-modules. To train and evaluate ZoomNAS, we introduce the first large-scale 2D human whole-body dataset, namely COCO-WholeBody V1.0, which annotates 133 keypoints for in-the-wild images. Extensive experiments demonstrate the effectiveness of ZoomNAS and the significance of COCO-WholeBody V1.0.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Facial Landmark Detection | AFLW Full | NME1.42 | 101 | |
| Facial Landmark Detection | COFW (test) | NME3.36 | 93 | |
| Whole-body Pose Estimation | COCO-Wholebody 1.0 (val) | Body AP74 | 64 | |
| Face Alignment | AFLW Frontal | NME (%)1.27 | 22 | |
| Whole-body Pose Estimation | COCO-WholeBody 1.0 | Whole-body AP65.4 | 20 | |
| Pose Estimation | Humans-5K (test) | Body AP59.7 | 13 | |
| Whole-body Pose Estimation | COCO-WholeBody V1.0 (test) | Body AP74.5 | 10 | |
| Hand Pose Estimation | WholeBody-Hand (WBH) (test) | PCK (%)80.2 | 7 | |
| Facial Landmark Detection | WholeBody-Face (WBF) | -- | 7 | |
| 2D hand pose estimation | Panoptic 7 (test) | PCK99.9 | 4 |