Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V

About

Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. In this work, we present COME-robot, the first closed-loop robotic system utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios.COME-robot incorporates two key innovative modules: (i) a multi-level open-vocabulary perception and situated reasoning module that enables effective exploration of the 3D environment and target object identification using commonsense knowledge and situated information, and (ii) an iterative closed-loop feedback and restoration mechanism that verifies task feasibility, monitors execution success, and traces failure causes across different modules for robust failure recovery. Through comprehensive experiments involving 8 challenging real-world mobile and tabletop manipulation tasks, COME-robot demonstrates a significant improvement in task success rate (~35%) compared to state-of-the-art methods. We further conduct comprehensive analyses to elucidate how COME-robot's design facilitates failure recovery, free-form instruction following, and long-horizon task planning.

Peiyuan Zhi, Zhiyuan Zhang, Yu Zhao, Muzhi Han, Zeyu Zhang, Zhitian Li, Ziyuan Jiao, Baoxiong Jia, Siyuan Huang• 2024

Related benchmarks

TaskDatasetResultRank
Move crumpled paper (brush nearby)Custom Robot Manipulation Scenes 1.0 (test)
Success Rate25
2
Move egg (open view)Custom Robot Manipulation Scenes 1.0 (test)
Success Rate0.2
2
Move grape/cherry (open view)Custom Robot Manipulation Scenes 1.0 (test)
Success Rate20
2
Move screw (towel nearby)Custom Robot Manipulation Scenes 1.0 (test)
Success Rate0.00e+0
2
Move sushi (open view)Custom Robot Manipulation Scenes 1.0 (test)
Success Rate14
2
Move tiny candy (towel nearby)Custom Robot Manipulation Scenes 1.0 (test)
Success Rate0.11
2
Pick up bowl (apple inside)Custom Robot Manipulation Scenes 1.0 (test)
Success Rate17
2
Pick up box (apple on top)Custom Robot Manipulation Scenes 1.0 (test)
Success Rate43
2
Pick up towel (orange on top)Custom Robot Manipulation Scenes 1.0 (test)
Success Rate50
2
Put apple on plate (container obstructs)Custom Robot Manipulation Scenes 1.0 (test)
Success Rate29
2
Showing 10 of 12 rows

Other info

Follow for update