MWM: Mobile World Models for Action-Conditioned Consistent Prediction
About
World models enable planning in imagined future predicted space, offering a promising framework for embodied navigation. However, existing navigation world models often lack action-conditioned consistency, so visually plausible predictions can still drift under multi-step rollout and degrade planning. Moreover, efficient deployment requires few-step diffusion inference, but existing distillation methods do not explicitly preserve rollout consistency, creating a training-inference mismatch. To address these challenges, we propose MWM, a mobile world model for planning-based image-goal navigation. Specifically, we introduce a two-stage training framework that combines structure pretraining with Action-Conditioned Consistency (ACC) post-training to improve action-conditioned rollout consistency. We further introduce Inference-Consistent State Distillation (ICSD) for few-step diffusion distillation with improved rollout consistency. Our experiments on benchmark and real-world tasks demonstrate consistent gains in visual fidelity, trajectory accuracy, planning success, and inference efficiency. Code: https://github.com/AIGeeksGroup/MWM. Website: https://aigeeksgroup.github.io/MWM.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Goal Conditioned Visual Navigation | SCAND | ATE1.14 | 18 | |
| Visual Fidelity | SCAND | FID80.97 | 15 | |
| Action-Conditioned Consistency | SCAND 1s horizon | LPIPS0.368 | 3 | |
| Action-Conditioned Consistency | SCAND 2s horizon | LPIPS0.395 | 3 | |
| Action-Conditioned Consistency | SCAND 4s horizon | LPIPS0.421 | 3 | |
| Action-Conditioned Consistency | SCAND 8s horizon | LPIPS0.459 | 3 | |
| Action-Conditioned Consistency | SCAND 16s horizon | LPIPS0.495 | 3 | |
| Goal-image Navigation | Real-world University Building | Success Rate (SR)30 | 3 | |
| Inference Efficiency | SCAND | Average Rollout Time (s)2.3 | 3 |