Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

C$^2$T: Captioning-Structure and LLM-Aligned Common-Sense Reward Learning for Traffic--Vehicle Coordination

About

State-of-the-art (SOTA) urban traffic control increasingly employs Multi-Agent Reinforcement Learning (MARL) to coordinate Traffic Light Controllers (TLCs) and Connected Autonomous Vehicles (CAVs). However, the performance of these systems is fundamentally capped by their hand-crafted, myopic rewards (e.g., intersection pressure), which fail to capture high-level, human-centric goals like safety, flow stability, and comfort. To overcome this limitation, we introduce C2T, a novel framework that learns a common-sense coordination model from traffic-vehicle dynamics. C2T distills "common-sense" knowledge from a Large Language Model (LLM) into a learned intrinsic reward function. This new reward is then used to guide the coordination policy of a cooperative multi-intersection TLC MARL system on CityFlow-based multi-intersection benchmarks. Our framework significantly outperforms strong MARL baselines in traffic efficiency, safety, and an energy-related proxy. We further highlight C2T's flexibility in principle, allowing distinct "efficiency-focused" versus "safety-focused" policies by modifying the LLM prompt.

Yuyang Chen, Kaiyan Zhao, Yiming Wang, Ming Yang, Bin Rao, Zhenning Li• 2026

Related benchmarks

TaskDatasetResultRank
Traffic Signal ControlJinan-2
Average Travel Time (ATT)53
48
Traffic Signal ControlJinan-1
Avg Travel Time (ATT)56.1
38
Traffic Signal ControlHangzhou (HZ-1)
Average Travel Time (ATT)65.1
24
Traffic Signal ControlNew York (196 intersections)
Average Travel Time87.9
2
Traffic Signal ControlHangzhou-2
Average Travel Time (ATT)62.4
2
Traffic Signal ControlCityFlow Extreme High-traffic Stress (test)
Average Travel Time (ATT)96.8
2
Traffic Signal ControlCityFlow 24-hour Cycle Stress Test
Average Travel Time (ATT)72.2
2
Showing 7 of 7 rows

Other info

Follow for update