Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AR Forcing: Towards Long-Horizon Robot Navigation World Model

About

The diffusion based robot navigation world models are typically trained using parallel supervision, while autoregressive inference is employed during path planning. This results in a distribution shift between training and inference, which destabilizes the performance over long-horizon prediction. We propose AR Forcing, an autoregressive training strategy, which integrates the standard diffusion loss into the autoregressive training loop. At each step, the model uses its own predictions to update the context and optimize the single step noise prediction objective, thereby explicitly exposing the model to the inference state distribution during training. Our method does not require additional discriminators or distribution-matching losses, retains the original diffusion framework and sampler, and is easy to integrate. Experiments on multi-domain navigation datasets (RECON, SCAND, HuRoN, TartanDrive) show that compared with strong baselines, AR Forcing improved the consistency of generated images during long-horizon navigation and the accuracy of predicted trajectories, enhancing robustness of the model in complex known and unknown environments. We will release the code soon.

Yifei Yang, Zehua Fan, Huan Li, Aoqi Wang, Lida Huang, Haibao Yu, Haiyan Liu, Xuanyao Mao, Jason Bao, Liang Xu, Bingchuan Sun, Yan Wang• 2026

Related benchmarks

TaskDatasetResultRank
Long-horizon predictionRECON
LPIPS0.261
10
Long-horizon predictionTartanDrive
LPIPS0.334
10
Long-horizon predictionSCAND
LPIPS0.396
10
Long-horizon predictionHuRON
LPIPS0.27
10
Goal Conditioned Visual NavigationGoal-Conditioned Visual Navigation 2 seconds horizon
ATE1.22
6
Goal Conditioned Visual NavigationRECON (4s horizon)
ATE1.69
2
Goal Conditioned Visual NavigationRECON (8s horizon)
ATE5.8
2
Goal Conditioned Visual NavigationHuRoN 4s horizon
ATE9.23
2
Goal Conditioned Visual NavigationHuRoN 8s horizon
ATE28.76
2
Goal Conditioned Visual NavigationHuRoN 16s horizon
ATE56.7
2
Showing 10 of 17 rows

Other info

Follow for update