OmniEgoCap: Camera-Agnostic Sequence-Level Egocentric Motion Reconstruction

About

The proliferation of commercial egocentric devices offers a unique lens into human behavior, yet reconstructing full-body 3D motion remains difficult due to frequent self-occlusion and the 'out-of-sight' nature of the wearer's limbs. While head and hand trajectories provide sparse anchor points, current methods often overfit to specific hardware optics or rely on expensive, post-hoc optimizations that compromise motion naturalness. In this paper, we present OmniEgoCap, a unified diffusion framework that scales egocentric reconstruction to diverse capture setups. By shifting from short-term windowed estimation to sequence-level inference, our method captures a global perspective and recovers invariant physical attributes, such as height and body proportions, that provide critical constraints for disambiguating head-only cues. To ensure hardware-agnostic generalization, we introduce a geometry-aware visibility augmentation strategy that treats intermittent hand appearances as principled geometric constraints rather than missing data. Our architecture jointly predicts temporally coherent motion and consistent body shape, establishing a new state-of-the-art on public benchmarks and demonstrating robust performance across diverse, in-the-wild environments.

Kyungwon Cho, Hanbyul Joo• 2025

Related benchmarks

Task	Dataset	Result
Egocentric Pose Estimation	HMD Setting 90° FoV	MPJPE93.58	6
Egocentric Pose Estimation	HMD Setting 180° FoV	MPJPE73.09	6
Egocentric Motion Reconstruction	AMASS	MPJPE (mm)80.71	4
Motion Reconstruction	EgoExo4D real-world	Jerk0.048	4

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord