Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MWM: Mobile World Models for Action-Conditioned Consistent Prediction

About

World models enable planning in imagined future predicted space, offering a promising framework for embodied navigation. However, existing navigation world models often lack action-conditioned consistency, so visually plausible predictions can still drift under multi-step rollout and degrade planning. Moreover, efficient deployment requires few-step diffusion inference, but existing distillation methods do not explicitly preserve rollout consistency, creating a training-inference mismatch. To address these challenges, we propose MWM, a mobile world model for planning-based image-goal navigation. Specifically, we introduce a two-stage training framework that combines structure pretraining with Action-Conditioned Consistency (ACC) post-training to improve action-conditioned rollout consistency. We further introduce Inference-Consistent State Distillation (ICSD) for few-step diffusion distillation with improved rollout consistency. Our experiments on benchmark and real-world tasks demonstrate consistent gains in visual fidelity, trajectory accuracy, planning success, and inference efficiency. Code: https://github.com/AIGeeksGroup/MWM. Website: https://aigeeksgroup.github.io/MWM.

Han Yan, Zishang Xiang, Zeyu Zhang, Hao Tang• 2026

Related benchmarks

TaskDatasetResultRank
Goal Conditioned Visual NavigationSCAND
ATE1.14
18
Visual FidelitySCAND
FID80.97
15
Action-Conditioned ConsistencySCAND 1s horizon
LPIPS0.368
3
Action-Conditioned ConsistencySCAND 2s horizon
LPIPS0.395
3
Action-Conditioned ConsistencySCAND 4s horizon
LPIPS0.421
3
Action-Conditioned ConsistencySCAND 8s horizon
LPIPS0.459
3
Action-Conditioned ConsistencySCAND 16s horizon
LPIPS0.495
3
Goal-image NavigationReal-world University Building
Success Rate (SR)30
3
Inference EfficiencySCAND
Average Rollout Time (s)2.3
3
Showing 9 of 9 rows

Other info

GitHub

Follow for update