Estimating Body and Hand Motion in an Ego-sensed World
About
We present EgoAllo, a system for human motion estimation from a head-mounted device. Using only egocentric SLAM poses and images, EgoAllo guides sampling from a conditional diffusion model to estimate 3D body pose, height, and hand parameters that capture a device wearer's actions in the allocentric coordinate frame of the scene. To achieve this, our key insight is in representation: we propose spatial and temporal invariance criteria for improving model performance, from which we derive a head motion conditioning parameterization that improves estimation by up to 18%. We also show how the bodies estimated by our system can improve hand estimation: the resulting kinematic and temporal constraints can reduce world-frame errors in single-frame estimates by 40%. Project page: https://egoallo.github.io/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Hand Pose Estimation | EgoExo4D 1.0 (test) | PA-MPJPE (mm)14.38 | 13 | |
| Body estimation | AMASS (test) | MPJPE (mm)119.7 | 8 | |
| Body estimation | RICH | MPJPE176.2 | 8 | |
| Body estimation | Aria Digital Twins | MPJPE155.1 | 8 | |
| Egocentric Pose Estimation | HMD Setting 90° FoV | MPJPE113.8 | 6 | |
| Egocentric Pose Estimation | HMD Setting 180° FoV | MPJPE90.67 | 6 | |
| Egocentric Motion Reconstruction | AMASS | MPJPE (mm)99.32 | 4 | |
| Motion Reconstruction | EgoExo4D real-world | Jerk0.065 | 4 |