StepNav: Structured Trajectory Priors for Efficient and Multimodal Visual Navigation

About

Visual navigation is fundamental to autonomous systems, yet generating reliable trajectories in cluttered and uncertain environments remains a core challenge. Recent generative models promise end-to-end synthesis, but their reliance on unstructured noise priors often yields unsafe, inefficient, or unimodal plans that cannot meet real-time requirements. We propose StepNav, a novel framework that bridges this gap by introducing structured, multimodal trajectory priors derived from variational principles. StepNav first learns a geometry-aware success probability field to identify all feasible navigation corridors. These corridors are then used to construct an explicit, multi-modal mixture prior that initializes a conditional flow-matching process. This refinement is formulated as an optimal control problem with explicit smoothness and safety regularization. By replacing unstructured noise with physically-grounded candidates, StepNav generates safer and more efficient plans in significantly fewer steps. Experiments in both simulation and real-world benchmarks demonstrate consistent improvements in robustness, efficiency, and safety over state-of-the-art generative planners, advancing reliable trajectory generation for practical autonomous navigation. The code has been released at https://github.com/LuoXubo/StepNav.

Xubo Luo, Aodi Wu, Haodong Han, Xue Wan, Wei Zhang, Leizheng Shu, Ruisuo Wang• 2026

Related benchmarks

Task	Dataset	Result
Point-Goal navigation	Stanford 2D-3D-S Indoor Basic Task	SR0.95	5
Point-Goal navigation	Stanford 2D-3D-S Indoor (Adaptation Task)	Success Rate90	5
Point-Goal navigation	Citysim Outdoor Basic Task	SR (%)0.57	5
Point-Goal navigation	Citysim Outdoor Adaptation Task	SR68	5

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord