VoxelKeypointFusion: Generalizable Multi-View Multi-Person Pose Estimation
About
In the rapidly evolving field of computer vision, the task of accurately estimating the poses of multiple individuals from various viewpoints presents a formidable challenge, especially if the estimations should be reliable as well. This work presents an extensive evaluation of the generalization capabilities of multi-view multi-person pose estimators to unseen datasets and presents a new algorithm with strong performance in this task. It also studies the improvements by additionally using depth information. Since the new approach can not only generalize well to unseen datasets, but also to different keypoints, the first multi-view multi-person whole-body estimator is presented. To support further research on those topics, all of the work is publicly accessible.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Human Pose Estimation | Campus | PCP91.1 | 36 | |
| 3D Human Pose Estimation | Shelf (test) | -- | 27 | |
| 3D Multi-person Pose Estimation | MVOR 23 (test) | MPJPE (mm)119 | 16 | |
| 3D Human Pose Estimation | Human3.6M (S9) | PCP96.9 | 14 | |
| 3D Human Pose Estimation | Chi3D | Invalid Rate10 | 14 | |
| Multi-person 3D Pose Estimation | Shelf (transfer) | PCP98.8 | 13 | |
| 3D Multi-person Pose Estimation | Human3.6M, Shelf, Campus, and MVOR Averaged Generalization | PCP85.3 | 12 | |
| 3D Multi-person Pose Estimation | Panoptic (test) | PCP97.1 | 12 | |
| 3D Human Pose Estimation | shelf | Latency (ms)38 | 11 | |
| 3D Multi-person Pose Estimation | human36m, shelf, campus, mvor, chi3d, tsinghua Averaged generalization (test) | PCP89 | 10 |