MetricHMSR:Metric Human Mesh and Scene Recovery from Monocular Images
About
We introduce MetricHMSR, a novel framework for recovering metric human meshes and 3D scenes from a single monocular image. Existing methods struggle to recover metric scale due to monocular scale ambiguity and weak-perspective camera assumptions. Moreover, their fully coupled feature representations make it difficult to disentangle local pose from global translation, often requiring multi-stage pipelines that introduce accumulated errors. To address these challenges, we propose MetricHMR (Metric Human Mesh Recovery), which incorporates a bounding camera ray map representation to provide explicit metric cues for human reconstruction,together with a Human Mixture-of-Experts (HumanMoE) that dynamically routes image features to specialized experts, enabling the disentangled perception of local human pose and global metric position. Leveraging the recovered metric human as a geometric anchor, we further refine monocular metric depth estimation to achieve more accurate 3D alignment between humans and scenes.Comprehensive experiments demonstrate that our method achieves state-of-the-art performance on both human mesh recovery and metric human-scene reconstruction. Project Page: https://Metaverse-AI-Lab-THU.github.io/MetricHMSR.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Human Pose Estimation | 3DPW | PA-MPJPE33.6 | 127 | |
| Global human motion estimation | RICH | WA-MPJPE109.6 | 21 | |
| Global motion and trajectory estimation | EMDB 2 | WA-MPJPE55.6 | 15 | |
| Human local body pose estimation | EMDB 1 | PA-MPJPE43.2 | 7 | |
| Depth Estimation | PROX | AbsRel13 | 4 | |
| 3D Position Estimation | SynFocal | RDE0.1 | 2 | |
| Body Shape and Height Estimation | 3DPW | H-MAE70.1 | 2 |