NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

About

Vision-Language-Action (VLA) models are advancing autonomous driving by replacing modular pipelines with unified end-to-end architectures. However, current VLAs face two expensive requirements: (1) massive dataset collection, and (2) dense reasoning annotations. In this work, we address both challenges with NORD (No Reasoning for Driving). Compared to existing VLAs, NORD achieves competitive performance while being fine-tuned on <60% of the data and no reasoning annotations, resulting in 3x fewer tokens. We identify that standard Group Relative Policy Optimization (GRPO) fails to yield significant improvements when applied to policies trained on such small, reasoning-free datasets. We show that this limitation stems from difficulty bias, which disproportionately penalizes reward signals from scenarios that produce high-variance rollouts within GRPO. NORD overcomes this by incorporating Dr. GRPO, a recent algorithm designed to mitigate difficulty bias in LLMs. As a result, NORD achieves competitive performance on Waymo and NAVSIM with a fraction of the training data and no reasoning overhead, enabling more efficient autonomous systems. Website: https://nord-vla-ai.github.io/

Ishaan Rawal, Shubh Gupta, Yihan Hu, Wei Zhan• 2026

Related benchmarks

Task	Dataset	Result
End-to-end Driving	Waymo E2E Driving Challenge (Leaderboard)	RFS (Overall)7.709	28
Trajectory Prediction	NAVSIM (navtest)	PDMS92.4	22
Autonomous Driving Planning	WOD-E2E	RFS Overall7.709	11
Trajectory Planning	WOD-E2E	RFS Overall7.709	11

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord