OpenFrontier: General Navigation with Visual-Language Grounded Frontiers

About

Open-world navigation requires robots to make decisions in complex everyday environments while adapting to flexible task requirements. Conventional navigation approaches often rely on dense 3D reconstruction and hand-crafted goal metrics, which limits their generalization across tasks and environments. Recent advances in vision-language navigation (VLN) and vision-language-action (VLA) models enable end-to-end policies conditioned on natural language, but typically require interactive training, large-scale data collection, or task-specific fine-tuning with a mobile agent. We formulate navigation as a sparse subgoal identification and reaching problem and observe that providing visual anchoring targets for high-level semantic priors enables highly efficient goal-conditioned navigation. Based on this insight, we select visual frontiers as semantic anchors and propose OpenFrontier, a navigation framework that requires no task-specific training or fine-tuning and seamlessly integrates diverse vision-language prior models. OpenFrontier enables efficient navigation with a lightweight system design, without dense 3D semantic mapping, task-specific policy training, or model fine-tuning. We evaluate OpenFrontier across multiple navigation benchmarks and demonstrate strong zero-shot performance, as well as effective real-world deployment on a mobile robot.

Esteban Padilla-Cerdio, Boyang Sun, Marc Pollefeys, Hermann Blum• 2026

Related benchmarks

Task	Dataset	Result
Object Goal Navigation	HM3D ObjNav (val)	SR77.3	15
Object Goal Navigation	MP3D (val)	SR40.7	11
Object Goal Navigation	OVON unseen (val)	Success Rate (SR)39	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord