Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving
About
Global navigation information and local scene understanding are two crucial components of autonomous driving systems. However, our experimental results indicate that many end-to-end autonomous driving systems tend to over-rely on local scene understanding while failing to utilize global navigation information. These systems exhibit weak correlation between their planning capabilities and navigation input, and struggle to perform navigation-following in complex scenarios. To overcome this limitation, we propose the Sequential Navigation Guidance (SNG) framework, an efficient representation of global navigation information based on real-world navigation patterns. The SNG encompasses both navigation paths for constraining long-term trajectories and turn-by-turn (TBT) information for real-time decision-making logic. We constructed the SNG-QA dataset, a visual question answering (VQA) dataset based on SNG that aligns global and local planning. Additionally, we introduce an efficient model SNG-VLA that fuses local planning with global planning. The SNG-VLA achieves state-of-the-art performance through precise navigation information modeling without requiring auxiliary loss functions from perception tasks. Project page: SNG-VLA
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Autonomous Driving | Bench2Drive | Merging Score33.8 | 31 | |
| Closed-loop Trajectory Planning | NAVSIM (navtest) | NC98.9 | 24 | |
| Autonomous Driving | Bench2Drive | Driving Score67.17 | 12 | |
| Trajectory Prediction | Bench2Drive | Average L2 Error0.82 | 11 |