SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving

About

Recent end-to-end autonomous driving approaches have leveraged Vision-Language Models (VLMs) to enhance planning capabilities in complex driving scenarios. However, VLMs are inherently trained as generalist models, lacking specialized understanding of driving-specific reasoning in 3D space and time. When applied to autonomous driving, these models struggle to establish structured spatial-temporal representations that capture geometric relationships, scene context, and motion patterns critical for safe trajectory planning. To address these limitations, we propose SGDrive, a novel framework that explicitly structures the VLM's representation learning around driving-specific knowledge hierarchies. Built upon a pre-trained VLM backbone, SGDrive decomposes driving understanding into a scene-agent-goal hierarchy that mirrors human driving cognition: drivers first perceive the overall environment (scene context), then attend to safety-critical agents and their behaviors, and finally formulate short-term goals before executing actions. This hierarchical decomposition provides the structured spatial-temporal representation that generalist VLMs lack, integrating multi-level information into a compact yet comprehensive format for trajectory planning. Extensive experiments on the NAVSIM benchmark demonstrate that SGDrive achieves state-of-the-art performance among camera-only methods on both PDMS and EPDMS, validating the effectiveness of hierarchical knowledge structuring for adapting generalist VLMs to autonomous driving.

Jingyu Li, Junjie Wu, Dongnan Hu, Xiangkai Huang, Bin Sun, Zhihui Hao, Xianpeng Lang, Xiatian Zhu, Li Zhang• 2026

Related benchmarks

Task	Dataset	Result
Autonomous Driving Planning	NAVSIM v2 (Navtest)	NC98.6	76
Closed-loop Autonomous Driving Planning	NAVSIM v1 (test)	NC98.6	63
Autonomous Driving	NAVSIM (test)	PDMS87.4	62
Autonomous Driving Planning	NAVSIM v2 (test)	NC98.6	52
Closed-loop Planning	NAVSIM v1 (test)	PDMS87.4	38
Motion Planning	NAVSIM v1 (test)	NC98.6	27
Autonomous Driving Planning	NAVSIM	NC98.6	26
Planning	NAVSIM v1	PDMS91.1	23
Open-loop planning	NAVSIM v2 (Navtest)	NC98.6	8

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord