Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving in the Long Tail
About
End-to-end architectures trained via imitation learning have advanced autonomous driving by scaling model size and data, yet performance remains brittle in safety-critical long-tail scenarios where supervision is sparse and causal understanding is limited. We introduce Alpamayo-R1 (AR1), a vision-language-action model (VLA) that integrates Chain of Causation reasoning with trajectory planning for complex driving scenarios. Our approach features three key innovations: (1) the Chain of Causation (CoC) dataset, built through a hybrid auto-labeling and human-in-the-loop pipeline producing decision-grounded, causally linked reasoning traces aligned with driving behaviors; (2) a modular VLA architecture combining Cosmos-Reason, a vision-language model pre-trained for Physical AI, with a diffusion-based trajectory decoder that generates dynamically feasible trajectories in real time; (3) a multi-stage training strategy using supervised fine-tuning to elicit reasoning and reinforcement learning (RL) to enforce reasoning-action consistency and optimize reasoning quality. AR1 achieves up to a 12% improvement in planning accuracy on challenging cases compared to a trajectory-only baseline, with a 35% reduction in close encounter rate in closed-loop simulation. RL post-training improves reasoning quality by 45% and reasoning-action consistency by 37%. Model scaling from 0.5B to 7B parameters shows consistent improvements. On-vehicle road tests confirm real-time performance (99 ms latency) and successful urban deployment. By bridging interpretable reasoning with precise control, AR1 demonstrates a practical path towards Level 4 autonomous driving. Model weights are available at https://huggingface.co/nvidia/Alpamayo-R1-10B with inference code at https://github.com/NVlabs/alpamayo.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| End-to-end Autonomous Driving | BridgeSim NavHard | DS37.78 | 15 | |
| Trajectory Generation | CARLA FPV 1000 samples (test) | Score0.19 | 10 | |
| Autonomous Driving Performance Evaluation | PhysicalAI-AV v1 (test) | ADE1.65 | 8 | |
| Trajectory Planning | HUGSIM nuScenes easy | NC0.976 | 6 | |
| Trajectory Planning | HUGSIM nuScenes (medium) | NC Score0.977 | 6 | |
| Trajectory Planning | HUGSIM nuScenes (hard) | NC0.968 | 6 | |
| Trajectory Planning | HUGSIM nuScenes (extreme) | NC0.962 | 6 | |
| Trajectory Planning | HUGSIM nuScenes (Overall) | NC0.971 | 6 |