Egocentric Whole-Body Human Mesh Recovery with Prior-Guided Learning
About
Egocentric human mesh recovery (HMR) from monocular head-mounted cameras is increasingly important for AR/VR applications, but remains challenging due to the lack of reliable ground-truth (GT) annotations based on parametric human body models such as SMPL and SMPL-X for real egocentric images. Existing egocentric HMR methods typically rely on pseudo-GT and focus on body pose estimation, which limits their ability to recover fine-grained whole-body details such as hands and face. We study egocentric whole-body human mesh recovery and propose a prior-guided learning framework that reconstructs whole-body meshes from a single egocentric image. We construct more accurate optimization-based pseudo-GT aligned with 3D joint supervision, and leverage multiple priors by adapting an exocentric HMR foundation model together with a diffusion-based pose prior. A deterministic undistortion module is further adopted to handle fisheye distortions in egocentric images. Experiments across multiple egocentric benchmarks demonstrate improved whole-body reconstruction compared to state-of-the-art methods, and show that our optimization-based pseudo-GT is substantially more accurate than existing regression-based pseudo-GT. To facilitate reproducibility, the code and dataset annotations are publicly available at https://github.com/naso06/EgoSMPLX.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Body-only Human Mesh Recovery | EgoPW (test) | PA-MPJPE87.28 | 3 | |
| Body-only Human Mesh Recovery | SceneEgo (test) | PA-MPJPE50.86 | 3 | |
| Hand-only Human Mesh Recovery | EgoWholeBody (subsampled train) | PA-MPJPE17.07 | 3 | |
| Whole-Body Human Mesh Recovery | EgoWholeBody (subsampled train) | PA-MPJPE57.75 | 3 | |
| Egocentric Human Mesh Recovery | Inference Efficiency NVIDIA RTX 3090 | GFLOPs191.9 | 3 |