OmniNav: A Unified Framework for Prospective Exploration and Visual-Language Navigation

About

Embodied navigation presents a core challenge for intelligent robots, requiring the comprehension of visual environments, natural language instructions, and autonomous exploration. Existing models often fall short in offering a unified solution across diverse navigation paradigms, resulting in low success rates and limited generalization. We introduce OmniNav, a unified framework addressing instruct-goal, object-goal, point-goal navigation, and frontier-based exploration within a single architecture. Our approach features a lightweight, low-latency policy that accurately predicts continuous-space waypoints (coordinates and orientations). This policy surpasses action-chunk methods in precision and supports real-world deployment at control frequencies up to 5 Hz. Architecturally, OmniNav employs a fast-slow system design: a fast module generates waypoints using short-horizon visual context and subtasks, while a slow module performs deliberative planning with long-horizon observations and candidate frontiers to select subsequent subgoals and subtasks. This collaboration enhances path efficiency and maintains trajectory coherence, particularly in exploration and memory-intensive scenarios. Crucially, we identify that the primary bottleneck isn't merely navigation policy learning, but a robust understanding of general instructions and objects. To boost generalization, OmniNav integrates large-scale, general-purpose training datasets, including those for image captioning and visual recognition, into a joint multi-task regimen. This significantly improves success rates and robustness. Extensive experiments confirm OmniNav's state-of-the-art performance across various navigation benchmarks, with real-world deployment further validating its efficacy. OmniNav provides practical insights for embodied navigation, charting a scalable path towards versatile, highly generalizable robotic intelligence.

Xinda Xue, Junjun Hu, Minghua Luo, Shichao Xie, Jintao Chen, Zixun Xie, Kuichen Quan, Wei Guo, Mu Xu, Zedong Chu• 2025

Related benchmarks

Task	Dataset	Result
Vision-Language Navigation	RxR-CE (val-unseen)	SR73.6	512
Vision-Language Navigation	VLN-CE R2R (val unseen)	Navigation Error (NE)3.74	76
Object Navigation	HM3D v2	Success Rate (SR)56.1	30
Vision-and-Language Navigation	HM3D Simulation	SR (B)90.63	18
Open-Vocabulary Navigation	HM3D OVON	Success Rate (SR)59.2	8
Navigation	POINav-Bench (test)	SR (2m)34.36	4
POI-Goal Navigation	BridgeNav Dataset (test)	SR (0.1m)18.78	4
Vision-Language Navigation	BridgeNav	Success Rate (0.1m)18.78	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord