Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

An Efficient and Multi-Modal Navigation System with One-Step World Model

About

Navigation is a fundamental capability for mobile robots. While the current trend is to use learning-based approaches to replace traditional geometry-based methods, existing end-to-end learning-based policies often struggle with 3D spatial reasoning and lack a comprehensive understanding of physical world dynamics. Integrating world models-which predict future observations conditioned on given actions-with iterative optimization planning offers a promising solution due to their capacity for imagination and flexibility. However, current navigation world models, typically built on pure transformer architectures, often rely on multi-step diffusion processes and autoregressive frame-by-frame generation. These mechanisms result in prohibitive computational latency, rendering real-time deployment impossible. To address this bottleneck, we propose a lightweight navigation world model that adopts a one-step generation paradigm and a 3D U-Net backbone equipped with efficient spatial-temporal attention. This design drastically reduces inference latency, enabling high-frequency control while achieving superior predictive performance. We also integrate this model into an optimization-based planning framework utilizing anchor-based initialization to handle multi-modal goal navigation tasks. Extensive closed-loop experiments in both simulation and real-world environments demonstrate our system's superior efficiency and robustness compared to state-of-the-art baselines.

Wangtian Shen, Ziyang Meng, Jinming Ma, Mingliang Zhou, Diyun Xiang• 2026

Related benchmarks

TaskDatasetResultRank
Image-Goal NavigationImage-Goal Navigation
SR72.67
7
Language-Goal NavigationLanguage-Goal Navigation
Success Rate (SR)69.33
6
Point-Goal navigationPoint-Goal Navigation
SR50.67
6
Image-Goal NavigationReal-world (Mobile Robot Platform) (test)
Success Rate0.8
4
Video GenerationMP3D and Habitat
PSNR17.611
4
Language-Goal NavigationReal-world (Mobile Robot Platform) (test)
Success Rate68
3
Point-Goal navigationReal-world (Mobile Robot Platform) (test)
Success Rate76
3
Showing 7 of 7 rows

Other info

Follow for update