Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Any-point Trajectory Modeling for Policy Learning

About

Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the lack of action labels. In this work, we introduce a novel framework, Any-point Trajectory Modeling (ATM), that utilizes video demonstrations by pre-training a trajectory model to predict future trajectories of arbitrary points within a video frame. Once trained, these trajectories provide detailed control guidance, enabling the learning of robust visuomotor policies with minimal action-labeled data. Across over 130 language-conditioned tasks we evaluated in both simulation and the real world, ATM outperforms strong video pre-training baselines by 80% on average. Furthermore, we show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology. Visualizations and code are available at: \url{https://xingyu-lin.github.io/atm}.

Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel• 2023

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO
Goal Achievement79.6
700
Robot ManipulationLIBERO (test)
Average Success Rate65.7
184
Open microwaveSimulation
Success Rate99.4
18
Block StackSimulation
Success Rate91.9
18
GlassSimulation
Success Rate58.6
18
open drawerReal-World (test)
Success Rate30
11
Robotic ManipulationMetaWorld
Door Open Success Rate75.3
10
Motion forecastingPanthera High motion (test)
Variance (Velocity)10
9
Motion forecastingPanthera Combined (test)
Var (V)2.42
9
Text-conditioned trajectory predictionLIBERO-90
Side MSE47.82
8
Showing 10 of 17 rows

Other info

Follow for update