AR Forcing: Towards Long-Horizon Robot Navigation World Model

About

The diffusion based robot navigation world models are typically trained using parallel supervision, while autoregressive inference is employed during path planning. This results in a distribution shift between training and inference, which destabilizes the performance over long-horizon prediction. We propose AR Forcing, an autoregressive training strategy, which integrates the standard diffusion loss into the autoregressive training loop. At each step, the model uses its own predictions to update the context and optimize the single step noise prediction objective, thereby explicitly exposing the model to the inference state distribution during training. Our method does not require additional discriminators or distribution-matching losses, retains the original diffusion framework and sampler, and is easy to integrate. Experiments on multi-domain navigation datasets (RECON, SCAND, HuRoN, TartanDrive) show that compared with strong baselines, AR Forcing improved the consistency of generated images during long-horizon navigation and the accuracy of predicted trajectories, enhancing robustness of the model in complex known and unknown environments. We will release the code soon.

Yifei Yang, Zehua Fan, Huan Li, Aoqi Wang, Lida Huang, Haibao Yu, Haiyan Liu, Xuanyao Mao, Jason Bao, Liang Xu, Bingchuan Sun, Yan Wang• 2026

Related benchmarks

Task	Dataset	Result
Long-horizon prediction	RECON	LPIPS0.261	10
Long-horizon prediction	TartanDrive	LPIPS0.334	10
Long-horizon prediction	SCAND	LPIPS0.396	10
Long-horizon prediction	HuRON	LPIPS0.27	10
Goal Conditioned Visual Navigation	Goal-Conditioned Visual Navigation 2 seconds horizon	ATE1.22	6
Goal Conditioned Visual Navigation	RECON (4s horizon)	ATE1.69	2
Goal Conditioned Visual Navigation	RECON (8s horizon)	ATE5.8	2
Goal Conditioned Visual Navigation	HuRoN 4s horizon	ATE9.23	2
Goal Conditioned Visual Navigation	HuRoN 8s horizon	ATE28.76	2
Goal Conditioned Visual Navigation	HuRoN 16s horizon	ATE56.7	2

Showing 10 of 17 rows

Other info

Follow for update

@wizwand_team Discord