StepNav: Structured Trajectory Priors for Efficient and Multimodal Visual Navigation
About
Visual navigation is fundamental to autonomous systems, yet generating reliable trajectories in cluttered and uncertain environments remains a core challenge. Recent generative models promise end-to-end synthesis, but their reliance on unstructured noise priors often yields unsafe, inefficient, or unimodal plans that cannot meet real-time requirements. We propose StepNav, a novel framework that bridges this gap by introducing structured, multimodal trajectory priors derived from variational principles. StepNav first learns a geometry-aware success probability field to identify all feasible navigation corridors. These corridors are then used to construct an explicit, multi-modal mixture prior that initializes a conditional flow-matching process. This refinement is formulated as an optimal control problem with explicit smoothness and safety regularization. By replacing unstructured noise with physically-grounded candidates, StepNav generates safer and more efficient plans in significantly fewer steps. Experiments in both simulation and real-world benchmarks demonstrate consistent improvements in robustness, efficiency, and safety over state-of-the-art generative planners, advancing reliable trajectory generation for practical autonomous navigation. The code has been released at https://github.com/LuoXubo/StepNav.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Point-Goal navigation | Stanford 2D-3D-S Indoor Basic Task | SR0.95 | 5 | |
| Point-Goal navigation | Stanford 2D-3D-S Indoor (Adaptation Task) | Success Rate90 | 5 | |
| Point-Goal navigation | Citysim Outdoor Basic Task | SR (%)0.57 | 5 | |
| Point-Goal navigation | Citysim Outdoor Adaptation Task | SR68 | 5 |