Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses

About

Egocentric video generation with fine-grained control through body motion is a key requirement towards embodied AI agents that can simulate, predict, and plan actions. In this work, we propose EgoControl, a pose-controllable video diffusion model trained on egocentric data. We train a video prediction model to condition future frame generation on explicit 3D body pose sequences. To achieve precise motion control, we introduce a novel pose representation that captures both global camera dynamics and articulated body movements, and integrate it through a dedicated control mechanism within the diffusion process. Given a short sequence of observed frames and a sequence of target poses, EgoControl generates temporally coherent and visually realistic future frames that align with the provided pose control. Experimental results demonstrate that EgoControl produces high-quality, pose-consistent egocentric videos, paving the way toward controllable embodied video simulation and understanding.

Enrico Pallotta, Sina Mokhtarzadeh Azar, Lars Doorenbos, Serdar Ozsoy, Umar Iqbal, Juergen Gall• 2025

Related benchmarks

TaskDatasetResultRank
Egocentric latent state predictionHOMAGE
L2 Distance (2s)0.099
7
Egocentric latent state predictionLEMMA
L2 Error (2s)0.091
7
Egocentric latent state predictionEgo-Exo4D Bike
L2 Distance (2s)0.085
7
Egocentric latent state predictionEgo-Exo4D Cooking
L2 Error (2s)0.09
7
Egocentric Video GenerationNymeria (PEVA/EgoControl)
LPIPS24.3
3
Showing 5 of 5 rows

Other info

Follow for update