Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SimLingo: Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment

About

Integrating large language models (LLMs) into autonomous driving has attracted significant attention with the hope of improving generalization and explainability. However, existing methods often focus on either driving or vision-language understanding but achieving both high driving performance and extensive language understanding remains challenging. In addition, the dominant approach to tackle vision-language understanding is using visual question answering. However, for autonomous driving, this is only useful if it is aligned with the action space. Otherwise, the model's answers could be inconsistent with its behavior. Therefore, we propose a model that can handle three different tasks: (1) closed-loop driving, (2) vision-language understanding, and (3) language-action alignment. Our model SimLingo is based on a vision language model (VLM) and works using only camera, excluding expensive sensors like LiDAR. SimLingo obtains state-of-the-art performance on the widely used CARLA simulator on the Bench2Drive benchmark and is the winning entry at the CARLA challenge 2024. Additionally, we achieve strong results in a wide variety of language-related tasks while maintaining high driving performance.

Katrin Renz, Long Chen, Elahe Arani, Oleg Sinavski• 2025

Related benchmarks

TaskDatasetResultRank
Closed-loop PlanningBench2Drive
Driving Score86.02
152
Closed-loop Autonomous DrivingBench2Drive
Driving Score (DS)85.07
74
Autonomous DrivingBench2Drive
Driving Score97.02
34
Closed-loop Autonomous DrivingBench2Drive closed-loop
DS85.07
34
Autonomous DrivingBench2Drive base set closed-loop
Driving Score (DS)85.1
32
Autonomous DrivingBench2Drive Multi-Ability
Merging Score0.54
25
Closed-loop Collaborative DrivingCARLA Town05
Disagreement Score (V0)32.14
20
End-to-end Autonomous DrivingBench2Drive standard base set (1K clips)
Driving Score (DS)85.9
19
Closed-loop End-to-End Autonomous DrivingBench2Drive base set
Driving Score (DS)85.94
17
Closed-loop PlanningCARLA Bench2Drive (leaderboard)
Driving Score (DS)85.07
17
Showing 10 of 24 rows

Other info

Follow for update