Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models

About

Foundation models pre-trained on web-scale data are shown to encapsulate extensive world knowledge beneficial for robotic manipulation in the form of task planning. However, the actual physical implementation of these plans often relies on task-specific learning methods, which require significant data collection and struggle with generalizability. In this work, we introduce Robotic Manipulation through Spatial Constraints of Parts (CoPa), a novel framework that leverages the common sense knowledge embedded within foundation models to generate a sequence of 6-DoF end-effector poses for open-world robotic manipulation. Specifically, we decompose the manipulation process into two phases: task-oriented grasping and task-aware motion planning. In the task-oriented grasping phase, we employ foundation vision-language models (VLMs) to select the object's grasping part through a novel coarse-to-fine grounding mechanism. During the task-aware motion planning phase, VLMs are utilized again to identify the spatial geometry constraints of task-relevant object parts, which are then used to derive post-grasp poses. We also demonstrate how CoPa can be seamlessly integrated with existing robotic planning algorithms to accomplish complex, long-horizon tasks. Our comprehensive real-world experiments show that CoPa possesses a fine-grained physical understanding of scenes, capable of handling open-set instructions and objects with minimal prompt engineering and without additional training. Project page: https://copa-2024.github.io/

Haoxu Huang, Fanqi Lin, Yingdong Hu, Shengjie Wang, Yang Gao• 2024

Related benchmarks

TaskDatasetResultRank
Average Robotic Manipulation SuccessReal-world Hardware
Success Rate60
5
Hammer NailReal-world Hardware
Success Rate30
5
Knock towerReal-world Hardware
Success Rate80
5
Reach blocksReal-world Hardware
Success Rate60
5
Sweep toysReal-world Hardware
Success Rate70
5
Robotic ManipulationReal World (unseen environments and tasks)
Task 1 Success Rate40
4
Object-centric manipulationReal-world 10 object-centric tasks
Egg Placing Success Rate20
4
Articulated Object ManipulationReal-world 3 articulated-object tasks
Drawer Opening Success Rate40
3
Showing 8 of 8 rows

Other info

Follow for update