Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

About

If generalist robots are to operate in truly unstructured environments, they need to be able to recognize and reason about novel objects and scenarios. Such objects and scenarios might not be present in the robot's own training data. We propose SuSIE, a method that leverages an image-editing diffusion model to act as a high-level planner by proposing intermediate subgoals that a low-level controller can accomplish. Specifically, we finetune InstructPix2Pix on video data, consisting of both human videos and robot rollouts, such that it outputs hypothetical future "subgoal" observations given the robot's current observation and a language command. We also use the robot data to train a low-level goal-conditioned policy to act as the aforementioned low-level controller. We find that the high-level subgoal predictions can utilize Internet-scale pretraining and visual understanding to guide the low-level goal-conditioned policy, achieving significantly better generalization and precision than conventional language-conditioned policies. We achieve state-of-the-art results on the CALVIN benchmark, and also demonstrate robust generalization on real-world manipulation tasks, beating strong baselines that have access to privileged information or that utilize orders of magnitude more compute and training data. The project website can be found at http://rail-berkeley.github.io/susie .

Kevin Black, Mitsuhiko Nakamoto, Pranav Atreya, Homer Walke, Chelsea Finn, Aviral Kumar, Sergey Levine• 2023

Related benchmarks

Task	Dataset	Result
Robot Manipulation	LIBERO	--	1025
Long-horizon robot manipulation	Calvin ABCD→D	Task 1 Completion Rate87	140
Robotic Manipulation	Calvin ABCD→D	Avg Length2.69	139
Robotic Manipulation	LIBERO Long	--	97
Long-horizon task completion	Calvin ABC->D	Success Rate (1)87	72
Robotic Manipulation	Calvin ABC->D	Task-1 Score87	71
Sequential Robotic Manipulation	CALVIN	Success Rate (1 task)87	63
Robot Manipulation	Calvin ABC->D	Average Successful Length2.69	62
Robotic Manipulation	RLBench (test)	Average Success Rate21.8	49
Robotic Manipulation	CALVIN D->D	Average Length2.8	40

Showing 10 of 30 rows

Other info

Follow for update

@wizwand_team Discord