Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Hydra-Nav: Object Navigation via Adaptive Dual-Process Reasoning

About

While large vision-language models (VLMs) show promise for object goal navigation, current methods still struggle with low success rates and inefficient localization of unseen objects--failures primarily attributed to weak temporal-spatial reasoning. Meanwhile, recent attempts to inject reasoning into VLM-based agents improve success rates but incur substantial computational overhead. To address both the ineffectiveness and inefficiency of existing approaches, we introduce Hydra-Nav, a unified VLM architecture that adaptively switches between a deliberative slow system for analyzing exploration history and formulating high-level plans, and a reactive fast system for efficient execution. We train Hydra-Nav through a three-stage curriculum: (i) spatial-action alignment to strengthen trajectory planning, (ii) memory-reasoning integration to enhance temporal-spatial reasoning over long-horizon exploration, and (iii) iterative rejection fine-tuning to enable selective reasoning at critical decision points. Extensive experiments demonstrate that Hydra-Nav achieves state-of-the-art performance on the HM3D, MP3D, and OVON benchmarks, outperforming the second-best methods by 11.1%, 17.4%, and 21.2%, respectively. Furthermore, we introduce SOT (Success weighted by Operation Time), a new metric to measure search efficiency across VLMs with varying reasoning intensity. Results show that adaptive reasoning significantly enhances search efficiency over fixed-frequency baselines.

Zixuan Wang, Huang Fang, Shaoan Wang, Yuanfei Luo, Heng Dong, Wei Li, Yiming Gan• 2026

Related benchmarks

TaskDatasetResultRank
ObjectGoal NavigationMP3D (val)
Success Rate64
68
Object Goal NavigationHM3D (val)
SR84.8
21
Object NavigationOVON unseen (val)
SR66.3
12
Open-Vocabulary Object NavigationOVON unseen (val)
SR66.3
10
Open-Vocabulary Object NavigationOVON seen (val)
SR65
9
Open-Vocabulary Object NavigationOVON Synonyms (val)
SR63.9
9
Object NavigationHM3D (val)
SR84.3
4
Showing 7 of 7 rows

Other info

Follow for update