Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
About
Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. In this work, we present COME-robot, the first closed-loop robotic system utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios.COME-robot incorporates two key innovative modules: (i) a multi-level open-vocabulary perception and situated reasoning module that enables effective exploration of the 3D environment and target object identification using commonsense knowledge and situated information, and (ii) an iterative closed-loop feedback and restoration mechanism that verifies task feasibility, monitors execution success, and traces failure causes across different modules for robust failure recovery. Through comprehensive experiments involving 8 challenging real-world mobile and tabletop manipulation tasks, COME-robot demonstrates a significant improvement in task success rate (~35%) compared to state-of-the-art methods. We further conduct comprehensive analyses to elucidate how COME-robot's design facilitates failure recovery, free-form instruction following, and long-horizon task planning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Move crumpled paper (brush nearby) | Custom Robot Manipulation Scenes 1.0 (test) | Success Rate25 | 2 | |
| Move egg (open view) | Custom Robot Manipulation Scenes 1.0 (test) | Success Rate0.2 | 2 | |
| Move grape/cherry (open view) | Custom Robot Manipulation Scenes 1.0 (test) | Success Rate20 | 2 | |
| Move screw (towel nearby) | Custom Robot Manipulation Scenes 1.0 (test) | Success Rate0.00e+0 | 2 | |
| Move sushi (open view) | Custom Robot Manipulation Scenes 1.0 (test) | Success Rate14 | 2 | |
| Move tiny candy (towel nearby) | Custom Robot Manipulation Scenes 1.0 (test) | Success Rate0.11 | 2 | |
| Pick up bowl (apple inside) | Custom Robot Manipulation Scenes 1.0 (test) | Success Rate17 | 2 | |
| Pick up box (apple on top) | Custom Robot Manipulation Scenes 1.0 (test) | Success Rate43 | 2 | |
| Pick up towel (orange on top) | Custom Robot Manipulation Scenes 1.0 (test) | Success Rate50 | 2 | |
| Put apple on plate (container obstructs) | Custom Robot Manipulation Scenes 1.0 (test) | Success Rate29 | 2 |