Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

About

Reinforcement Learning (RL) can mitigate the causal confusion and distribution shift inherent to imitation learning (IL). However, applying RL to end-to-end autonomous driving (E2E-AD) remains an open problem for its training difficulty, and IL is still the mainstream paradigm in both academia and industry. Recently Model-based Reinforcement Learning (MBRL) have demonstrated promising results in neural planning; however, these methods typically require privileged information as input rather than raw sensor data. We fill this gap by designing Raw2Drive, a dual-stream MBRL approach. Initially, we efficiently train an auxiliary privileged world model paired with a neural planner that uses privileged information as input. Subsequently, we introduce a raw sensor world model trained via our proposed Guidance Mechanism, which ensures consistency between the raw sensor world model and the privileged world model during rollouts. Finally, the raw sensor world model combines the prior knowledge embedded in the heads of the privileged world model to effectively guide the training of the raw sensor policy. Raw2Drive is so far the only RL based end-to-end method on CARLA Leaderboard 2.0, and Bench2Drive and it achieves state-of-the-art performance.

Zhenjie Yang, Xiaosong Jia, Qifeng Li, Xue Yang, Maoqing Yao, Junchi Yan• 2025

Related benchmarks

Task	Dataset	Result
Closed-loop Planning	Bench2Drive	Driving Score71.36	194
Closed-loop Autonomous Driving	Bench2Drive	Driving Score (DS)71.36	74
Closed-loop Autonomous Driving	Bench2Drive closed-loop	DS71.36	46
End-to-end Autonomous Driving	Bench2Drive 220 routes official	Overall Success Rate (DS)71.36	39
Autonomous Driving	Bench2Drive closed-loop (test)	SR (%)50.24	28
Autonomous Driving	Bench2Drive Multi-Ability	Merging Score0.4335	25
Autonomous Driving	Bench2Drive base (train)	Driving Score71.36	19
Closed-loop Planning	CARLA Bench2Drive (leaderboard)	Driving Score (DS)71.36	17
Closed-loop Planning	Bench2Drive base V0.0.3 (train)	Driving Score71.36	16
Closed-loop Autonomous Driving	Bench2Drive V0.0.3	Driving Score (DS)71.36	16

Showing 10 of 13 rows

Other info

Follow for update

@wizwand_team Discord