Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Direct Dynamic Retargeting for Humanoid Imitation Learning from Videos

About

Imitation Learning from monocular video demonstrations provides a scalable approach for teaching complex skills to humanoid robots. However, translating human motion to humanoids requires overcoming significant morphological mismatches. Standard approaches rely on Geometric Retargeting or Indirect Dynamic Retargeting pipelines. We identify that these intermediate kinematic projections introduce a geometric bias, restricting the search space and yielding suboptimal dynamic behaviors. In this paper, we propose Direct Dynamic Retargeting (DDR), a novel single-stage framework that generates high-fidelity, dynamically feasible trajectories directly from expert videos. By formulating the problem in the task space and leveraging a sampling-based Model Predictive Control solver within a physics simulator, DDR natively optimizes over complex contact sequences while mitigating input drift. Our experiments demonstrate that bypassing the geometric bias allows DDR to outperform state-of-the-art baselines in demonstration tracking accuracy. Furthermore, we establish that providing such physically viable references to RL agents accelerates training convergence and enhances the final execution of agile and balancing behaviors. Source code will be made publicly available.

Constant Roux, Ludovic De Matte\"is, Armand Jordana, Valentin Guillet, Nicolas Mansard, Olivier Stasse, Philippe Sou\`eres• 2026

Related benchmarks

TaskDatasetResultRank
Reference TrackingSquat movement
Mean Laplacian Error (m)0.055
6
Reference TrackingOne-foot balance movement
Joint RMSE0.667
5
Keypoint trackingOne-foot balance
Laplacian Error0.101
3
Motion RetargetingSMPL Trajectories Pistol Squat
Infeasible Segment Percentage0.00e+0
3
Motion RetargetingSMPL Trajectories Balancing Stick
Infeasible Segments Rate0.2
3
Motion RetargetingKung fu
Contact Mismatch Rate4.23
3
Motion RetargetingOne-foot balance
Contact Sequence Mismatch Rate13.71
3
Motion RetargetingPistol Squat
Contact Sequence Mismatch Rate5.35
3
Motion RetargetingBalancing Stick
Contact Mismatch Rate7.86
3
Reference TrackingKung fu movement
Joint RMSE (rad)0.627
3
Showing 10 of 28 rows

Other info

Follow for update