OmniEgoCap: Camera-Agnostic Sequence-Level Egocentric Motion Reconstruction
About
The proliferation of commercial egocentric devices offers a unique lens into human behavior, yet reconstructing full-body 3D motion remains difficult due to frequent self-occlusion and the 'out-of-sight' nature of the wearer's limbs. While head and hand trajectories provide sparse anchor points, current methods often overfit to specific hardware optics or rely on expensive, post-hoc optimizations that compromise motion naturalness. In this paper, we present OmniEgoCap, a unified diffusion framework that scales egocentric reconstruction to diverse capture setups. By shifting from short-term windowed estimation to sequence-level inference, our method captures a global perspective and recovers invariant physical attributes, such as height and body proportions, that provide critical constraints for disambiguating head-only cues. To ensure hardware-agnostic generalization, we introduce a geometry-aware visibility augmentation strategy that treats intermittent hand appearances as principled geometric constraints rather than missing data. Our architecture jointly predicts temporally coherent motion and consistent body shape, establishing a new state-of-the-art on public benchmarks and demonstrating robust performance across diverse, in-the-wild environments.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Egocentric Pose Estimation | HMD Setting 90° FoV | MPJPE93.58 | 6 | |
| Egocentric Pose Estimation | HMD Setting 180° FoV | MPJPE73.09 | 6 | |
| Egocentric Motion Reconstruction | AMASS | MPJPE (mm)80.71 | 4 | |
| Motion Reconstruction | EgoExo4D real-world | Jerk0.048 | 4 |