Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Navigation World Models

About

Navigation is a fundamental skill of agents with visual-motor capabilities. We introduce a Navigation World Model (NWM), a controllable video generation model that predicts future visual observations based on past observations and navigation actions. To capture complex environment dynamics, NWM employs a Conditional Diffusion Transformer (CDiT), trained on a diverse collection of egocentric videos of both human and robotic agents, and scaled up to 1 billion parameters. In familiar environments, NWM can plan navigation trajectories by simulating them and evaluating whether they achieve the desired goal. Unlike supervised navigation policies with fixed behavior, NWM can dynamically incorporate constraints during planning. Experiments demonstrate its effectiveness in planning trajectories from scratch or by ranking trajectories sampled from an external policy. Furthermore, NWM leverages its learned visual priors to imagine trajectories in unfamiliar environments from a single input image, making it a flexible and powerful tool for next-generation navigation systems.

Amir Bar, Gaoyue Zhou, Danny Tran, Trevor Darrell, Yann LeCun• 2024

Related benchmarks

TaskDatasetResultRank
Goal Conditioned Visual NavigationSCAND
ATE1.28
18
Goal Conditioned Visual NavigationRECON
ATE1.13
16
Visual generation2D trajectory dataset
LPIPS0.377
16
Geometric Drift EvaluationHuRON
Euclidean Distance (ED)8.99
15
Geometric Drift EvaluationTartanDrive
Endpoint Distance (ED)6.41
15
Perceptual DriftRECON
LPIPS0.33
15
Perceptual DriftSCAND
LPIPS0.353
15
Perceptual DriftTartanDrive
LPIPS0.381
15
Geometric Drift EvaluationRECON
Euclidean Distance (ED)9.4
15
Perceptual DriftHuRON
LPIPS0.445
15
Showing 10 of 94 rows
...

Other info

Code

Follow for update