Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation

About

The Zero-Shot Object Navigation (ZSON) task requires embodied agents to find a previously unseen object by navigating in unfamiliar environments. Such a goal-oriented exploration heavily relies on the ability to perceive, understand, and reason based on the spatial information of the environment. However, current LLM-based approaches convert visual observations to language descriptions and reason in the linguistic space, leading to the loss of spatial information. In this paper, we introduce TopV-Nav, an MLLM-based method that directly reasons on the top-view map with sufficient spatial information. To fully unlock the MLLM's spatial reasoning potential in top-view perspective, we propose the Adaptive Visual Prompt Generation (AVPG) method to adaptively construct semantically-rich top-view map. It enables the agent to directly utilize spatial information contained in the top-view map to conduct thorough reasoning. Besides, we design a Dynamic Map Scaling (DMS) mechanism to dynamically zoom top-view map at preferred scales, enhancing local fine-grained reasoning. Additionally, we devise a Potential Target Driven (PTD) mechanism to predict and to utilize target locations, facilitating global and human-like exploration. Experiments on MP3D and HM3D datasets demonstrate the superiority of our TopV-Nav.

Linqing Zhong, Chen Gao, Zihan Ding, Yue Liao, Huimin Ma, Shifeng Zhang, Xu Zhou, Si Liu• 2024

Related benchmarks

TaskDatasetResultRank
Object Goal NavigationMP3D
SR35.2
96
ObjectGoal NavigationMP3D (val)
Success Rate35.2
68
Object Goal NavigationHM3D 0.1
SR53
35
Object NavigationHM3D v1 (val)
SR52
32
Object NavigationHM3D v0.1
Success Rate (SR)53
18
NavigationCityNav (test unseen)
Navigation Error (NE)79.5
14
NavigationCityNav unseen (val)
Navigation Error (NE)77.4
14
NavigationCityNav seen (val)
Navigation Error (NE)102.7
14
Object Goal NavigationHM3D (1000 episodes)
Success Rate (SR)45.9
13
Showing 9 of 9 rows

Other info

Follow for update