Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

About

If generalist robots are to operate in truly unstructured environments, they need to be able to recognize and reason about novel objects and scenarios. Such objects and scenarios might not be present in the robot's own training data. We propose SuSIE, a method that leverages an image-editing diffusion model to act as a high-level planner by proposing intermediate subgoals that a low-level controller can accomplish. Specifically, we finetune InstructPix2Pix on video data, consisting of both human videos and robot rollouts, such that it outputs hypothetical future "subgoal" observations given the robot's current observation and a language command. We also use the robot data to train a low-level goal-conditioned policy to act as the aforementioned low-level controller. We find that the high-level subgoal predictions can utilize Internet-scale pretraining and visual understanding to guide the low-level goal-conditioned policy, achieving significantly better generalization and precision than conventional language-conditioned policies. We achieve state-of-the-art results on the CALVIN benchmark, and also demonstrate robust generalization on real-world manipulation tasks, beating strong baselines that have access to privileged information or that utilize orders of magnitude more compute and training data. The project website can be found at http://rail-berkeley.github.io/susie .

Kevin Black, Mitsuhiko Nakamoto, Pranav Atreya, Homer Walke, Chelsea Finn, Aviral Kumar, Sergey Levine• 2023

Related benchmarks

TaskDatasetResultRank
Robot ManipulationLIBERO--
494
Long-horizon robot manipulationCalvin ABCD→D
Task 1 Completion Rate87
96
Long-horizon task completionCalvin ABC->D
Success Rate (1)87
67
Robot ManipulationCalvin ABC->D
Average Successful Length2.69
36
Robotic ManipulationRLBench (test)
Average Success Rate21.8
34
Instruction-following robotic manipulationCALVIN ABC→D (unseen environment D)
Success Rate (Length 1)87
29
Robot ManipulationMetaWorld 50 tasks
Success Rate (Easy)56
21
Robot ManipulationCALVIN ABC->D 1.0
Success Rate (1 Inst)87
18
Long-horizon robotic manipulationCALVIN ABC→D (Zero-shot)
Task 1 Success Rate87
16
Long-horizon task completionCALVIN
Success Rate (1 Task)87
15
Showing 10 of 19 rows

Other info

Follow for update