TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation

About

The Zero-Shot Object Navigation (ZSON) task requires embodied agents to find a previously unseen object by navigating in unfamiliar environments. Such a goal-oriented exploration heavily relies on the ability to perceive, understand, and reason based on the spatial information of the environment. However, current LLM-based approaches convert visual observations to language descriptions and reason in the linguistic space, leading to the loss of spatial information. In this paper, we introduce TopV-Nav, an MLLM-based method that directly reasons on the top-view map with sufficient spatial information. To fully unlock the MLLM's spatial reasoning potential in top-view perspective, we propose the Adaptive Visual Prompt Generation (AVPG) method to adaptively construct semantically-rich top-view map. It enables the agent to directly utilize spatial information contained in the top-view map to conduct thorough reasoning. Besides, we design a Dynamic Map Scaling (DMS) mechanism to dynamically zoom top-view map at preferred scales, enhancing local fine-grained reasoning. Additionally, we devise a Potential Target Driven (PTD) mechanism to predict and to utilize target locations, facilitating global and human-like exploration. Experiments on MP3D and HM3D datasets demonstrate the superiority of our TopV-Nav.

Linqing Zhong, Chen Gao, Zihan Ding, Yue Liao, Huimin Ma, Shifeng Zhang, Xu Zhou, Si Liu• 2024

Related benchmarks

Task	Dataset	Result
Object Goal Navigation	MP3D	SR35.2	129
Object Navigation	HM3D	Success Rate (SR)45.9	110
ObjectGoal Navigation	MP3D (val)	Success Rate35.2	68
Object Goal Navigation	HM3D 0.1	SR53	35
Object Navigation	HM3D v1 (val)	SR52	32
Object Navigation	HM3D v0.1	Success Rate (SR)53	18
Navigation	CityNav (test unseen)	Navigation Error (NE)79.5	14
Navigation	CityNav unseen (val)	Navigation Error (NE)77.4	14
Navigation	CityNav seen (val)	Navigation Error (NE)102.7	14
Object Goal Navigation	HM3D (1000 episodes)	Success Rate (SR)45.9	13

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord