Any-point Trajectory Modeling for Policy Learning

About

Learning from demonstration is a powerful method for teaching robots new skills, and having more demonstration data often improves policy learning. However, the high cost of collecting demonstration data is a significant bottleneck. Videos, as a rich data source, contain knowledge of behaviors, physics, and semantics, but extracting control-specific information from them is challenging due to the lack of action labels. In this work, we introduce a novel framework, Any-point Trajectory Modeling (ATM), that utilizes video demonstrations by pre-training a trajectory model to predict future trajectories of arbitrary points within a video frame. Once trained, these trajectories provide detailed control guidance, enabling the learning of robust visuomotor policies with minimal action-labeled data. Across over 130 language-conditioned tasks we evaluated in both simulation and the real world, ATM outperforms strong video pre-training baselines by 80% on average. Furthermore, we show effective transfer learning of manipulation skills from human videos and videos from a different robot morphology. Visualizations and code are available at: \url{https://xingyu-lin.github.io/atm}.

Chuan Wen, Xingyu Lin, John So, Kai Chen, Qi Dou, Yang Gao, Pieter Abbeel• 2023

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	Object Achievement89.4	957
Robot Manipulation	LIBERO (test)	Average Success Rate65.7	220
Robot Manipulation	LIBERO Object	Success Rate68	127
Robot Manipulation	LIBERO	Spatial Success Rate69	116
Robotic Manipulation	LIBERO Long	Success Rate39	91
Robotic Manipulation	LIBERO v1 (test)	Average Success Rate37.5	83
Robotic Manipulation	LIBERO Goal	Success Rate78	42
Robotic Manipulation	LIBERO Average across suites	Success Rate (SR)63	29
Robotic Manipulation	LIBERO Spatial	Success Rate (SR)69	28
Open microwave	Simulation	Success Rate99.4	18

Showing 10 of 26 rows

Other info

Follow for update

@wizwand_team Discord