Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout

About

Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-step rollout capabilities for driver dynamics. We introduce Driver-WM, a driver-centric latent world model that rolls out in-cabin dynamics causally conditioned on out-cabin traffic context. This formulation unifies physical kinematics forecasting with auxiliary behavioral and emotional semantic recognition. Operating in a compact latent space constructed from frozen vision-language features, Driver-WM adopts a dual-stream architecture to separately encode external traffic and internal driver states. These streams are directionally coupled via a gated causal injection mechanism, which uses a learned vector gate to modulate external contextual perturbations while strictly enforcing temporal causality. Experiments on AIDE show robust long-horizon forecasting on reactive high-motion clips, improved driver/traffic semantic alignment, and controlled interventions that expose the external-to-internal mechanism.

Haozhuang Chi, Daosheng Qiu, Hao Su, Haochen Liu, Zirui Li, Haoruo Zhang, Chen Lv• 2026

Related benchmarks

Task	Dataset	Result
Semantic Recognition	AIDE 5→5 rollout (test)	DBR (F1)73.35	12
Kinematic Rollout	AIDE High-Motion 5→5 rollout (test)	MPJPE (px)136.5	12
Kinematic Rollout	AIDE 5→5 rollout (test)	MPJPE (px)71.47	12
Multi-step kinematic and semantic rollout	AIDE (test)	MPJPE (All, px)71.53	11
Semantic Recognition	AIDE (test)	DBR Acc73.34	5

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord