Towards Viewpoint Invariant 3D Human Pose Estimation
About
We propose a viewpoint invariant model for 3D human pose estimation from a single depth image. To achieve this, our discriminative model embeds local regions into a learned viewpoint invariant feature space. Formulated as a multi-task learning problem, our model is able to selectively predict partial poses in the presence of noise and occlusion. Our approach leverages a convolutional and recurrent network architecture with a top-down error feedback mechanism to self-correct previous pose estimates in an end-to-end manner. We evaluate our model on a previously published depth dataset and a newly collected human pose dataset containing 100K annotated depth images from extreme viewpoints. Experiments show that our model achieves competitive performance on frontal views while achieving state-of-the-art performance on alternate viewpoints.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Human Pose Estimation | ITOP top-view | Head Accuracy98.1 | 23 | |
| 3D Human Pose Estimation | ITOP front-view | Head Joint Accuracy98.1 | 22 | |
| 3D Human Pose Estimation | ITOP front-view 1.0 | Head Accuracy98.1 | 4 | |
| Body Part Detection | Viewpoint Transfer Task Dataset (test) | Head Detection Rate55.6 | 4 | |
| 3D Human Pose Estimation | ITOP top-view 1.0 | Head98.1 | 4 | |
| 3D Human Pose Estimation | EVAL cross-view | Head0.939 | 2 |