PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning
About
Vehicle motion planning is an essential component of autonomous driving technology. Current rule-based vehicle motion planning methods perform satisfactorily in common scenarios but struggle to generalize to long-tailed situations. Meanwhile, learning-based methods have yet to achieve superior performance over rule-based approaches in large-scale closed-loop scenarios. To address these issues, we propose PlanAgent, the first mid-to-mid planning system based on a Multi-modal Large Language Model (MLLM). MLLM is used as a cognitive agent to introduce human-like knowledge, interpretability, and common-sense reasoning into the closed-loop planning. Specifically, PlanAgent leverages the power of MLLM through three core modules. First, an Environment Transformation module constructs a Bird's Eye View (BEV) map and a lane-graph-based textual description from the environment as inputs. Second, a Reasoning Engine module introduces a hierarchical chain-of-thought from scene understanding to lateral and longitudinal motion instructions, culminating in planner code generation. Last, a Reflection module is integrated to simulate and evaluate the generated planner for reducing MLLM's uncertainty. PlanAgent is endowed with the common-sense reasoning and generalization capability of MLLM, which empowers it to effectively tackle both common and complex long-tailed scenarios. Our proposed PlanAgent is evaluated on the large-scale and challenging nuPlan benchmarks. A comprehensive set of experiments convincingly demonstrates that PlanAgent outperforms the existing state-of-the-art in the closed-loop motion planning task. Codes will be soon released.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Closed-loop Planning | nuPlan 14 Hard (test) | -- | 73 | |
| Trajectory Planning | nuPlan 14 (val) | NR Metric93.26 | 27 | |
| Trajectory Planning | nuPlan 14 Hard (test) | CLS-NR Score72.51 | 24 | |
| Closed-loop motion planning | nuPlan 14 Hard (test) | NR-CLS72.51 | 15 | |
| Driving Visual Question Answering | nuScenes-QA | QA EM55.7 | 10 | |
| Question Answering | Waymo-QA | QA EM52.5 | 10 |