Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data

About

A long-standing goal in robotics is to build robots that can perform a wide range of daily tasks from perceptions obtained with their onboard sensors and specified only via natural language. While recently substantial advances have been achieved in language-driven robotics by leveraging end-to-end learning from pixels, there is no clear and well-understood process for making various design choices due to the underlying variation in setups. In this paper, we conduct an extensive study of the most critical challenges in learning language conditioned policies from offline free-form imitation datasets. We further identify architectural and algorithmic techniques that improve performance, such as a hierarchical decomposition of the robot control learning, a multimodal transformer encoder, discrete latent plans and a self-supervised contrastive loss that aligns video and language representations. By combining the results of our investigation with our improved model components, we are able to present a novel approach that significantly outperforms the state of the art on the challenging language conditioned long-horizon robot manipulation CALVIN benchmark. We have open-sourced our implementation to facilitate future research in learning to perform many complex manipulation skills in a row specified with natural language. Codebase and trained models available at http://hulc.cs.uni-freiburg.de

Oier Mees, Lukas Hermann, Wolfram Burgard• 2022

Related benchmarks

TaskDatasetResultRank
Long-horizon robot manipulationCalvin ABCD→D
Task 1 Completion Rate88.9
96
Long-horizon task completionCalvin ABC->D
Success Rate (1)89.2
67
Robot ManipulationCalvin ABC->D
Average Successful Length0.67
36
Robot ManipulationCALVIN ABC->D 1.0
Success Rate (1 Inst)41.8
18
Long-horizon task completionCALVIN
Success Rate (1 Task)41.8
15
Robotic ManipulationCALVIN D->D
Success Rate (Length 1)82.7
12
Robot ManipulationCALVIN 10% ABCD → D
Success Rate (L=1)66.8
11
Robot ManipulationCALVIN D->D
Average Successful Length2.64
6
Long-horizon robot manipulationCALVIN 10% data
Task 1 Completion Rate66.8
4
Long-horizon robot manipulationCALVIN unseen lang
Task Completion Rate (1 Task)71.5
4
Showing 10 of 11 rows

Other info

Follow for update