WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving

About

Latent World Models enhance scene representation through temporal self-supervised learning, presenting a perception annotation-free paradigm for end-to-end autonomous driving. However, the reconstruction-oriented representation learning tangles perception with planning tasks, leading to suboptimal optimization for planning. To address this challenge, we propose WorldRFT, a planning-oriented latent world model framework that aligns scene representation learning with planning via a hierarchical planning decomposition and local-aware interactive refinement mechanism, augmented by reinforcement learning fine-tuning (RFT) to enhance safety-critical policy performance. Specifically, WorldRFT integrates a vision-geometry foundation model to improve 3D spatial awareness, employs hierarchical planning task decomposition to guide representation optimization, and utilizes local-aware iterative refinement to derive a planning-oriented driving policy. Furthermore, we introduce Group Relative Policy Optimization (GRPO), which applies trajectory Gaussianization and collision-aware rewards to fine-tune the driving policy, yielding systematic improvements in safety. WorldRFT achieves state-of-the-art (SOTA) performance on both open-loop nuScenes and closed-loop NavSim benchmarks. On nuScenes, it reduces collision rates by 83% (0.30% -> 0.05%). On NavSim, using camera-only sensors input, it attains competitive performance with the LiDAR-based SOTA method DiffusionDrive (87.8 vs. 88.1 PDMS).

Pengxuan Yang, Ben Lu, Zhongpu Xia, Chao Han, Yinfeng Gao, Teng Zhang, Kun Zhan, XianPeng Lang, Yupeng Zheng, Qichao Zhang• 2025

Related benchmarks

Task	Dataset	Result
Autonomous Driving Planning	NAVSIM v1 (test)	NC97.8	151
Autonomous Driving	NAVSIM v1 (test)	NC97.8	147
Planning	nuScenes (val)	Collision Rate (Avg)4	97
Autonomous Driving Planning	NAVSIM v2 (Navtest)	NC97.8	76
Planning	NAVSIM v1 (test)	PDMS87.8	62
Autonomous Driving Planning	NAVSIM v2 (test)	NC97.8	52
End-to-end Planning	nuScenes	L2 Error (3s)0.76	45
Autonomous Driving Planning	NAVSIM v2	NC97.8	37
Autonomous Driving Planning	NAVSIM v1 (navtest)	NC97.8	37
Closed-loop Planning	NAVSIM v2	NC97.8	27

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord