Learnable Triangulation of Human Pose
About
We present two novel solutions for multi-view 3D human pose estimation based on new learnable triangulation methods that combine 3D information from multiple 2D views. The first (baseline) solution is a basic differentiable algebraic triangulation with an addition of confidence weights estimated from the input images. The second solution is based on a novel method of volumetric aggregation from intermediate 2D backbone feature maps. The aggregated volume is then refined via 3D convolutions that produce final 3D joint heatmaps and allow modelling a human pose prior. Crucially, both approaches are end-to-end differentiable, which allows us to directly optimize the target metric. We demonstrate transferability of the solutions across datasets and considerably improve the multi-view state of the art on the Human3.6M dataset. Video demonstration, annotations and additional materials will be posted on our project page (https://saic-violet.github.io/learnable-triangulation).
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Human Pose Estimation | MPI-INF-3DHP (test) | PCK71.3 | 559 | |
| 3D Human Pose Estimation | Human3.6M (test) | MPJPE (Average)34 | 547 | |
| 3D Human Pose Estimation | Human3.6M Protocol 1 (test) | Dir. Error (Protocol 1)19.9 | 183 | |
| 3D Human Pose Estimation | Human3.6M | MPJPE17.7 | 160 | |
| 3D Human Pose Estimation | Human3.6M (S9, S11) | Average Error (MPJPE Avg)20.8 | 94 | |
| 3D Pose Estimation | Human3.6M | MPJPE (mm)20.8 | 66 | |
| 3D Human Pose Estimation | Human3.6M v1 (test) | Avg Performance49.9 | 58 | |
| Human Mesh Reconstruction | Human3.6M | -- | 50 | |
| Multiview Pedestrian Detection | WILDTRACK (test) | MODA88.6 | 46 | |
| 3D Pose Estimation | Total Capture (test) | Mean MPJPE25.9 | 42 |