Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation

About

We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation: interacting with unseen objects in novel scenes without test-time adaptation. While typical approaches rely on a large amount of demonstration data for such generalization, we propose an approach that leverages web videos to predict plausible interaction plans and learns a task-agnostic transformation to obtain robot actions in the real world. Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal, and can be trained with diverse videos on the web including those of humans and robots manipulating everyday objects. We use these 2D track predictions to infer a sequence of rigid transforms of the object to be manipulated, and obtain robot end-effector poses that can be executed in an open-loop manner. We then refine this open-loop plan by predicting residual actions through a closed loop policy trained with a few embodiment-specific demonstrations. We show that this approach of combining scalably learned track prediction with a residual policy requiring minimal in-domain robot-specific data enables diverse generalizable robot manipulation, and present a wide array of real-world robot manipulation results across unseen tasks, objects, and scenes. https://homangab.github.io/track2act/

Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta, Shubham Tulsiani• 2024

Related benchmarks

TaskDatasetResultRank
Planning Billiard ShotsBilliard Simulator
Accuracy8
10
Motion forecastingPanthera High motion (test)
Variance (Velocity)8.01
9
Motion forecastingPanthera Combined (test)
Var (V)1.89
9
Robot ManipulationMeta-World low-data regime
Door Open Success88
8
Motion forecastingMammalMotion All Data (High motion)
ADE0.136
7
Motion forecastingMammalMotion All Data (Combined)
ADE0.053
7
Robot Manipulation Skill AdaptationInstruction-Guided Skill Adaptation Simulation v1 (test)
Task 1 Success Rate44
5
Robotic ManipulationIsaacSkill (within-distribution)
Pouring SR88
5
Poked Motion GenerationPexels Dense
Min MSE138.7
3
Robotic ManipulationReal-world experiment (test)
Pouring Success Rate44
3
Showing 10 of 10 rows

Other info

Follow for update