Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Direct Multi-view Multi-person 3D Pose Estimation

About

We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images. Instead of estimating 3D joint locations from costly volumetric representation or reconstructing the per-person 3D pose from multiple detected 2D poses as in previous methods, MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks. Specifically, MvP represents skeleton joints as learnable query embeddings and let them progressively attend to and reason over the multi-view information from the input images to directly regress the actual 3D joint locations. To improve the accuracy of such a simple pipeline, MvP presents a hierarchical scheme to concisely represent query embeddings of multi-person skeleton joints and introduces an input-dependent query adaptation approach. Further, MvP designs a novel geometrically guided attention mechanism, called projective attention, to more precisely fuse the cross-view information for each joint. MvP also introduces a RayConv operation to integrate the view-dependent camera geometry into the feature representations for augmenting the projective attention. We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient. Notably, it achieves 92.3% AP25 on the challenging Panoptic dataset, improving upon the previous best approach [36] by 9.8%. MvP is general and also extendable to recovering human mesh represented by the SMPL model, thus useful for modeling multi-person body shapes. Code and models are available at https://github.com/sail-sg/mvp.

Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng• 2021

Related benchmarks

TaskDatasetResultRank
3D Human Pose EstimationCampus (test)
Actor 1 Score99.3
66
3D Human Pose EstimationCampus
PCP96.6
36
3D Multi-person Pose EstimationShelf (test)
Actor 1 Score99.3
27
3D Human Pose EstimationShelf (test)
Actor 1 Score98.2
27
3D Pose Estimationshelf
PCP Actor 199.3
25
3D Human Pose EstimationCMU Panoptic JLT+15 (test)
MPJPE15.76
14
3D Multi-person Pose Estimation (In-domain)Shelf 2 (test)
PCP97.4
12
3D Multi-person Pose Estimation (In-domain)Campus 2 (test)
PCP96.6
11
Multi-person 3D Pose EstimationPanoptic
MPJPE (mm)15.8
10
3D Human Pose EstimationCMU Panoptic Average K=1-7 CMU0 (test)
AP@2521.6
10
Showing 10 of 20 rows

Other info

Code

Follow for update