Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GeoNav: Empowering MLLMs with dual-scale geospatial reasoning for language-goal aerial navigation

About

Language-goal aerial navigation requires UAVs to localize targets in the complex outdoors, such as urban blocks based on textual instructions. The indoor methods are often hard to scale to urban scenes due to ambiguous objects, limited visual field, and spatial reasoning. In this work, we propose GeoNav, a multi-modal agent for long-range aerial navigation with geospatial awareness. GeoNav operates in three phases-landmark navigation, target search, and precise localization-mimicking human coarse-to-fine spatial reasoning patterns. To support such reasoning, it dynamically builds dual-scale spatial representations. The first is a global but schematic cognitive map, which fuses prior geographic knowledge and embodied visual cues into a top-down and explicit annotated form. It enables fast navigation to the landmark region via intuitive map-based reasoning. The second is a local but delicate scene graph representing hierarchical spatial relationships between landmarks and objects, utilized for accurate target localization. On top of the structured memory, GeoNav employs a spatial chain-of-thought mechanism to enable MLLMs with efficient and interpretable action-making across stages. On the CityNav benchmark, GeoNav surpasses the current SOTA up to 18.4% in success rate and significantly eliminates navigation error. The ablation studies highlight the importance of each module, positioning structured spatial perception as the key to advanced UAV navigation. Published in Pattern Recognition, 2026.

Haotian Xu, Yue Hu, Chen Gao, Zhengqiu Zhu, Yong Zhao, Yong Li, Quanjun Yin• 2025

Related benchmarks

TaskDatasetResultRank
NavigationCityNav (test unseen)
Navigation Error (NE)73.5
14
NavigationCityNav unseen (val)
Navigation Error (NE)64.1
14
NavigationCityNav seen (val)
Navigation Error (NE)58.6
14
Vision-Language NavigationCityNav Easy
Navigation Error (NE)59.86
6
Vision-Language NavigationCityNav Medium
Navigation Error (Path Length)53.8
6
Vision-Language NavigationCityNav Hard
Navigation Error (NE)68.9
6
Showing 6 of 6 rows

Other info

Follow for update