Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

About

This paper addresses the challenge of 3D human pose estimation from a single color image. Despite the general success of the end-to-end learning paradigm, top performing approaches employ a two-step solution consisting of a Convolutional Network (ConvNet) for 2D joint localization and a subsequent optimization step to recover 3D pose. In this paper, we identify the representation of 3D pose as a critical issue with current ConvNet approaches and make two important contributions towards validating the value of end-to-end learning for this task. First, we propose a fine discretization of the 3D space around the subject and train a ConvNet to predict per voxel likelihoods for each joint. This creates a natural representation for 3D pose and greatly improves performance over the direct regression of joint coordinates. Second, to further improve upon initial estimates, we employ a coarse-to-fine prediction scheme. This step addresses the large dimensionality increase and enables iterative refinement and repeated processing of the image features. The proposed approach outperforms all state-of-the-art methods on standard benchmarks achieving a relative error reduction greater than 30% on average. Additionally, we investigate using our volumetric representation in a related architecture which is suboptimal compared to our end-to-end approach, but is of practical interest, since it enables training when no image with corresponding 3D groundtruth is available, and allows us to present compelling results for in-the-wild images.

Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, Kostas Daniilidis• 2016

Related benchmarks

Task	Dataset	Result
3D Human Pose Estimation	Human3.6M (test)	MPJPE (Average)41.8	570
3D Human Pose Estimation	Human3.6M (Protocol #1)	MPJPE (Avg.)51.9	457
3D Human Pose Estimation	Human3.6M (Protocol 2)	Average MPJPE41.8	315
3D Human Pose Estimation	Human3.6M	--	193
3D Human Pose Estimation	Human3.6M Protocol 1 (test)	Dir. Error (Protocol 1)67.4	183
3D Human Pose Estimation	Human3.6M (subjects 9 and 11)	Average Error41.5	180
3D Human Pose Estimation	Human3.6M Protocol #2 (test)	Average Error41.8	140
3D Human Pose Estimation	HumanEva-I (test)	Walking S1 Error (mm)22.1	85
3D Human Pose Estimation	Human3.6M S9 and S11 (test)	Dir. Error67.4	72
3D Human Pose Estimation	Human3.6M v1 (test)	Avg Performance71.9	58

Showing 10 of 23 rows

Other info

Code

Follow for update

@wizwand_team Discord