Monocular Models are Strong Learners for Multi-View Human Mesh Recovery

About

Multi-view human mesh recovery (HMR) is broadly deployed in diverse domains where high accuracy and strong generalization are essential. Existing approaches can be broadly grouped into geometry-based and learning-based methods. However, geometry-based methods (e.g., triangulation) rely on cumbersome camera calibration, while learning-based approaches often generalize poorly to unseen camera configurations due to the lack of multi-view training data, limiting their performance in real-world scenarios. To enable calibration-free reconstruction that generalizes to arbitrary camera setups, we propose a training-free framework that leverages pretrained single-view HMR models as strong priors, eliminating the need for multi-view training data. Our method first constructs a robust and consistent multi-view initialization from single-view predictions, and then refines it via test-time optimization guided by multi-view consistency and anatomical constraints. Extensive experiments demonstrate state-of-the-art performance on standard benchmarks, surpassing multi-view models trained with explicit multi-view supervision.

Haoyu Xie, Shengkai Xu, Cheng Guo, Muhammad Usama Saleem, Wenhan Wu, Chen Chen, Ahmed Helmy, Pu Wang, Hongfei Xue• 2026

Related benchmarks

Task	Dataset	Result
3D Human Pose Estimation	MPI-INF-3DHP	MPJPE39	122
Human Mesh Recovery	Human3.6M Protocol 1 (test)	PA-MPJPE20.6	33
3D Human Pose Estimation	MPI-INF-3DHP cross-camera settings (test)	MPJPE43.71	2

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord