A2J: Anchor-to-Joint Regression Network for 3D Articulated Pose Estimation from a Single Depth Image
About
For 3D hand and body pose estimation task in depth image, a novel anchor-based approach termed Anchor-to-Joint regression network (A2J) with the end-to-end learning ability is proposed. Within A2J, anchor points able to capture global-local spatial context information are densely set on depth image as local regressors for the joints. They contribute to predict the positions of the joints in ensemble way to enhance generalization ability. The proposed 3D articulated pose estimation paradigm is different from the state-of-the-art encoder-decoder based FCN, 3D CNN and point-set based manners. To discover informative anchor points towards certain joint, anchor proposal procedure is also proposed for A2J. Meanwhile 2D CNN (i.e., ResNet-50) is used as backbone network to drive A2J, without using time-consuming 3D convolutional or deconvolutional layers. The experiments on 3 hand datasets and 2 body datasets verify A2J's superiority. Meanwhile, A2J is of high running speed around 100 FPS on single NVIDIA 1080Ti GPU.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Hand Pose Estimation | NYU (test) | Mean Error (mm)8.61 | 100 | |
| 3D Hand Pose Estimation | ICVL (test) | Mean Error (mm)6.46 | 91 | |
| 3D Human Pose Estimation | ITOP top-view | Head Accuracy98.38 | 23 | |
| 3D Human Pose Estimation | ITOP front-view | Head Joint Accuracy98.54 | 22 | |
| 3D Hand Pose Estimation | NYU | Mean Distance Error (mm)8.61 | 19 | |
| 3D Hand Pose Estimation | HANDS 2017 (test) | SEEN Error (mm)6.92 | 17 | |
| 3D Hand Pose Estimation | ICVL | Mean Distance Error (mm)6.46 | 17 | |
| 3D Hand Pose Estimation | HANDS frame-based challenge 2017 (test) | Avg 3D Error8.57 | 11 | |
| 3D Hand Pose Estimation | NYU Hand Pose Dataset (test) | Mean Joint 3D Error (mm)8.61 | 11 | |
| 3D Hand Pose Estimation | Dex-YCB (test) | PA-MPJPE (Scene 0)23.93 | 10 |