Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

From Sparse to Dense: Spatio-Temporal Fusion for Multi-View 3D Human Pose Estimation with DenseWarper

About

In multi-view 3D human pose estimation, models typically rely on images captured simultaneously from different camera views to predict a pose at a specific moment. While providing accurate spatial information, this traditional approach often overlooks the rich temporal dependencies between adjacent frames. We propose a novel 3D human pose estimation input method: the sparse interleaved input to address this. This method leverages images captured from different camera views at various time points (e.g., View 1 at time $t$ and View 2 at time $t+\delta$), allowing our model to capture rich spatio-temporal information and effectively boost performance. More importantly, this approach offers two key advantages: First, it can theoretically increase the output pose frame rate by N times with N cameras, thereby breaking through single-view frame rate limitations and enhancing the temporal resolution of the production. Second, using a sparse subset of available frames, our method can reduce data redundancy and simultaneously achieve better performance. We introduce the DenseWarper model, which leverages epipolar geometry for efficient spatio-temporal heatmap exchange. We conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Results demonstrate that our method, utilizing only sparse interleaved images as input, outperforms traditional dense multi-view input approaches and achieves state-of-the-art performance. The source code for this work is available at: https://github.com/lingli1724/DenseWarper-ICLR2026

Ling Li, Changjie Chen, Yuyan Wang, Jiaqing Lyu, Kenglun Chang, Yiyun Chen, Zhidong Deng• 2026

Related benchmarks

TaskDatasetResultRank
3D Human Pose EstimationHuman3.6M (test)
MPJPE (Average)19.4
570
3D Human Pose EstimationMPI-INF-3DHP
MPJPE65.89
122
3D Human Pose EstimationHuman3.6M 2D SimpleBaseline (test)
MPJPE Error (Direction)21.2
11
3D Human Pose EstimationHuman3.6M 2D Ground Truth (test)
Dir.23.2
11
3D Human Pose EstimationHuman3.6M 2D CPN (test)
Average Performance Score33.6
9
3D Human Pose EstimationHuman3.6M
Efficiency per MB0.291
8
Showing 6 of 6 rows

Other info

Follow for update