Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting

About

Existing volumetric methods for predicting 3D human pose estimation are accurate, but computationally expensive and optimized for single time-step prediction. We present TEMPO, an efficient multi-view pose estimation model that learns a robust spatiotemporal representation, improving pose accuracy while also tracking and forecasting human pose. We significantly reduce computation compared to the state-of-the-art by recurrently computing per-person 2D pose features, fusing both spatial and temporal information into a single representation. In doing so, our model is able to use spatiotemporal context to predict more accurate human poses without sacrificing efficiency. We further use this representation to track human poses over time as well as predict future poses. Finally, we demonstrate that our model is able to generalize across datasets without scene-specific fine-tuning. TEMPO achieves 10$\%$ better MPJPE with a 33$\times$ improvement in FPS compared to TesseTrack on the challenging CMU Panoptic Studio dataset.

Rohan Choudhury, Kris Kitani, Laszlo A. Jeni• 2023

Related benchmarks

TaskDatasetResultRank
3D Human Pose EstimationCampus
PCP97.3
36
3D Pose Estimationshelf
PCP Actor 199.3
25
3D Multi-person Pose EstimationMVOR 23 (test)
MPJPE (mm)102
16
3D Human Pose EstimationCMU Panoptic JLT+15 (test)
MPJPE14.68
14
3D Human Pose EstimationHuman3.6M (S9)
PCP82
14
3D Human Pose EstimationChi3D
Invalid Rate660
14
Multi-person 3D Pose EstimationShelf (transfer)
PCP96.4
13
3D Multi-person Pose EstimationPanoptic (test)
PCP98.1
12
3D Multi-person Pose EstimationHuman3.6M, Shelf, Campus, and MVOR Averaged Generalization
PCP59.1
12
Multi-person 3D Pose EstimationPanoptic
MPJPE (mm)14.7
10
Showing 10 of 13 rows

Other info

Follow for update