Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Monocular Models are Strong Learners for Multi-View Human Mesh Recovery

About

Multi-view human mesh recovery (HMR) is broadly deployed in diverse domains where high accuracy and strong generalization are essential. Existing approaches can be broadly grouped into geometry-based and learning-based methods. However, geometry-based methods (e.g., triangulation) rely on cumbersome camera calibration, while learning-based approaches often generalize poorly to unseen camera configurations due to the lack of multi-view training data, limiting their performance in real-world scenarios. To enable calibration-free reconstruction that generalizes to arbitrary camera setups, we propose a training-free framework that leverages pretrained single-view HMR models as strong priors, eliminating the need for multi-view training data. Our method first constructs a robust and consistent multi-view initialization from single-view predictions, and then refines it via test-time optimization guided by multi-view consistency and anatomical constraints. Extensive experiments demonstrate state-of-the-art performance on standard benchmarks, surpassing multi-view models trained with explicit multi-view supervision.

Haoyu Xie, Shengkai Xu, Cheng Guo, Muhammad Usama Saleem, Wenhan Wu, Chen Chen, Ahmed Helmy, Pu Wang, Hongfei Xue• 2026

Related benchmarks

TaskDatasetResultRank
3D Human Pose EstimationMPI-INF-3DHP
PCK99.9
114
Human Mesh RecoveryHuman3.6M Protocol 1 (test)
PA-MPJPE20.6
33
3D Human Pose EstimationMPI-INF-3DHP cross-camera settings (test)
MPJPE43.71
2
Showing 3 of 3 rows

Other info

Follow for update