Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

About

We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that goal image. We show that this is possible zero-shot using DALL-E, without needing any further example arrangements, data collection, or training. DALL-E-Bot is fully autonomous and is not restricted to a pre-defined set of objects or scenes, thanks to DALL-E's web-scale pre-training. Encouraging real-world results, with both human studies and objective metrics, show that integrating web-scale diffusion models into robotics pipelines is a promising direction for scalable, unsupervised robot learning.

Ivan Kapelyukh, Vitalis Vosylius, Edward Johns• 2022

Related benchmarks

TaskDatasetResultRank
Image-to-Image Translationsummer-winter Global 512x512
FID90.51
12
Image-to-Image Translationhorse-zebra Local 512x512
FID239.6
11
Object RearrangementShopping Scene
Success Rate: Apple in Bowl14
8
Geometric rearrangementPool ball scene
Success Rate (X Shape)0.00e+0
8
Showing 4 of 4 rows

Other info

Follow for update