DriveWorld-VLA: Unified Latent-Space World Modeling with Vision-Language-Action for Autonomous Driving

About

End-to-end (E2E) autonomous driving has recently attracted increasing interest in unifying Vision-Language-Action (VLA) with World Models to enhance decision-making and forward-looking imagination. However, existing methods fail to effectively unify future scene evolution and action planning within a single architecture due to inadequate sharing of latent states, limiting the impact of visual imagination on action decisions. To address this limitation, we propose DriveWorld-VLA, a novel framework that unifies world modeling and planning within a latent space by tightly integrating VLA and world models at the representation level, which enables the VLA planner to benefit directly from holistic scene-evolution modeling and reducing reliance on dense annotated supervision. Additionally, DriveWorld-VLA incorporates the latent states of the world model as core decision-making states for the VLA planner, facilitating the planner to assess how candidate actions impact future scene evolution. By conducting world modeling entirely in the latent space, DriveWorld-VLA supports controllable, action-conditioned imagination at the feature level, avoiding expensive pixel-level rollouts. Extensive open-loop and closed-loop evaluations demonstrate the effectiveness of DriveWorld-VLA, which achieves state-of-the-art performance with 91.3 PDMS on NAVSIMv1, 86.8 EPDMS on NAVSIMv2, and 0.16 3-second average collision rate on nuScenes. Code and models will be released in https://github.com/liulin815/DriveWorld-VLA.git.

Feiyang jia, Lin Liu, Ziying Song, Caiyan Jia, Hangjun Ye, Xiaoshuai Hao, Long Chen• 2026

Related benchmarks

Task	Dataset	Result
Autonomous Driving Planning	NAVSIM v1 (test)	NC99.1	151
Autonomous Driving Planning	NAVSIM v2 (Navtest)	NC98.6	76
Open-loop planning	nuScenes v1.0 (val)	L2 (1s)0.28	71
Planning	NAVSIM v1 (test)	PDMS91.3	62
Autonomous Driving Planning	NAVSIM v2	NC98.6	37
Planning	NAVSIM v2 (Navtest)	NC98.6	32
Closed-loop Planning	NAVSIM v2	NC98.6	27
Autonomous Driving Planning	NAVSIM	NC99.1	26
Planning	nuScenes v1.0 (val)	Collision Rate (3s)0.38	22
Closed-loop Planning	NAVSIM v1	NC99.1	17

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord