Cross View Fusion for 3D Human Pose Estimation
About
We present an approach to recover absolute 3D human poses from multi-view images by incorporating multi-view geometric priors in our model. It consists of two separate steps: (1) estimating the 2D poses in multi-view images and (2) recovering the 3D poses from the multi-view 2D poses. First, we introduce a cross-view fusion scheme into CNN to jointly estimate 2D poses for multiple views. Consequently, the 2D pose estimation for each view already benefits from other views. Second, we present a recursive Pictorial Structure Model to recover the 3D pose from the multi-view 2D poses. It gradually improves the accuracy of 3D pose with affordable computational cost. We test our method on two public datasets H36M and Total Capture. The Mean Per Joint Position Errors on the two datasets are 26mm and 29mm, which outperforms the state-of-the-arts remarkably (26mm vs 52mm, 29mm vs 35mm). Our code is released at \url{https://github.com/microsoft/multiview-human-pose-estimation-pytorch}.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Human Pose Estimation | MPI-INF-3DHP (test) | PCK23.3 | 559 | |
| 3D Human Pose Estimation | Human3.6M (test) | MPJPE (Average)26.21 | 547 | |
| 3D Human Pose Estimation | Human3.6M Protocol 1 (test) | Dir. Error (Protocol 1)24 | 183 | |
| 3D Human Pose Estimation | Human3.6M (subjects 9 and 11) | Average Error26.2 | 180 | |
| 3D Human Pose Estimation | Human3.6M | MPJPE26.2 | 160 | |
| 3D Human Pose Estimation | Human3.6M (S9, S11) | Average Error (MPJPE Avg)26.2 | 94 | |
| 3D Pose Estimation | Human3.6M | MPJPE (mm)26.2 | 66 | |
| 3D Pose Estimation | Total Capture (test) | Mean MPJPE29 | 42 | |
| 3D Human Pose Estimation | Human3.6M 13 (test) | MPJPE (mm)26.2 | 21 | |
| 3D Human Pose Estimation | TotalCapture (Seen Cameras (1,3,5,7), Seen Subjects (S1, S2, S3)) | W219 | 17 |