An Efficient and Multi-Modal Navigation System with One-Step World Model
About
Navigation is a fundamental capability for mobile robots. While the current trend is to use learning-based approaches to replace traditional geometry-based methods, existing end-to-end learning-based policies often struggle with 3D spatial reasoning and lack a comprehensive understanding of physical world dynamics. Integrating world models-which predict future observations conditioned on given actions-with iterative optimization planning offers a promising solution due to their capacity for imagination and flexibility. However, current navigation world models, typically built on pure transformer architectures, often rely on multi-step diffusion processes and autoregressive frame-by-frame generation. These mechanisms result in prohibitive computational latency, rendering real-time deployment impossible. To address this bottleneck, we propose a lightweight navigation world model that adopts a one-step generation paradigm and a 3D U-Net backbone equipped with efficient spatial-temporal attention. This design drastically reduces inference latency, enabling high-frequency control while achieving superior predictive performance. We also integrate this model into an optimization-based planning framework utilizing anchor-based initialization to handle multi-modal goal navigation tasks. Extensive closed-loop experiments in both simulation and real-world environments demonstrate our system's superior efficiency and robustness compared to state-of-the-art baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image-Goal Navigation | Image-Goal Navigation | SR72.67 | 7 | |
| Language-Goal Navigation | Language-Goal Navigation | Success Rate (SR)69.33 | 6 | |
| Point-Goal navigation | Point-Goal Navigation | SR50.67 | 6 | |
| Image-Goal Navigation | Real-world (Mobile Robot Platform) (test) | Success Rate0.8 | 4 | |
| Video Generation | MP3D and Habitat | PSNR17.611 | 4 | |
| Language-Goal Navigation | Real-world (Mobile Robot Platform) (test) | Success Rate68 | 3 | |
| Point-Goal navigation | Real-world (Mobile Robot Platform) (test) | Success Rate76 | 3 |