Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models

About

Visual target navigation is a critical capability for autonomous robots operating in unknown environments, particularly in human-robot interaction scenarios. While classical and learning-based methods have shown promise, most existing approaches lack common-sense reasoning and are typically designed for single-robot settings, leading to reduced efficiency and robustness in complex environments. To address these limitations, we introduce Co-NavGPT, a novel framework that integrates a Vision Language Model (VLM) as a global planner to enable common-sense multi-robot visual target navigation. Co-NavGPT aggregates sub-maps from multiple robots with diverse viewpoints into a unified global map, encoding robot states and frontier regions. The VLM uses this information to assign frontiers across the robots, facilitating coordinated and efficient exploration. Experiments on the Habitat-Matterport 3D (HM3D) demonstrate that Co-NavGPT outperforms existing baselines in terms of success rate and navigation efficiency, without requiring task-specific training. Ablation studies further confirm the importance of semantic priors from the VLM. We also validate the framework in real-world scenarios using quadrupedal robots. Supplementary video and code are available at: https://sites.google.com/view/co-navgpt2.

Bangguo Yu, Qihao Yuan, Kailai Li, Hamidreza Kasaei, Ming Cao• 2023

Related benchmarks

TaskDatasetResultRank
Object Goal NavigationHM3D v0.2 (val)
Success Rate (SR)53.9
10
Multi-Robot ObjectNavHM3D v0.2 (val)
Success Rate (SR)66.1
5
Multi-agent Semantic Object NavigationHabitat-Matterport3D fire conditions (val)
NS187.9
4
Multi-agent Semantic Object NavigationHabitat-Matterport3D normal conditions (val)
NS185.4
4
Multi-Agent Semantic NavigationHM3D v0.2
Success Rate (Predicted Semantics)66.1
3
Showing 5 of 5 rows

Other info

Follow for update