Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

About

The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics. The dominant approaches for language-guided manipulation use 2D image representations, which face difficulties in combining multi-view cameras and inferring precise 3D positions and relationships. To address these limitations, we propose a 3D point cloud based policy called PolarNet for language-guided manipulation. It leverages carefully designed point cloud inputs, efficient point cloud encoders, and multimodal transformers to learn 3D point cloud representations and integrate them with language instructions for action prediction. PolarNet is shown to be effective and data efficient in a variety of experiments conducted on the RLBench benchmark. It outperforms state-of-the-art 2D and 3D approaches in both single-task and multi-task learning. It also achieves promising results on a real robot.

Shizhe Chen, Ricardo Garcia, Cordelia Schmid, Ivan Laptev• 2023

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationRLBench
Avg Success Score46.4
56
Robotic ManipulationRLBench (test)
Average Success Rate46.4
34
Multi-task Robotic ManipulationRLBench
Avg Success Rate48.7
16
Robotic ManipulationRLBench 10 tasks
Pick & Lift Success Rate97.8
13
Multi-task Robotic ManipulationRLBench 100 demonstrations (test)
Average Success Rate89.8
11
Robotic ManipulationRLBench 18Task
Average Success Rate46.4
9
Multi-task Robotic ManipulationGemBench
Avg Success38.4
8
Vision-based Robotic ManipulationGemBench (test)
Average Score38.4
8
Robot ManipulationRLBench 10 Tasks single-variation
Success Rate92.1
6
Robotic ManipulationGemBench Level 3 (Articulated objects)
Success Rate38.5
6
Showing 10 of 17 rows

Other info

Code

Follow for update