Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DGLight: DQN-Guided GRPO Fine-Tuning of Large Language Models for Traffic Signal Control

About

Traffic signal control (TSC) plays a central role in reducing congestion and maintaining urban mobility. This dissertation introduces DGLight, a critic-guided reinforcement-learning framework for adapting a pretrained large language model to TSC. DGLight first trains a CoLight-based Deep Q-Network critic to estimate traffic-aware action values from structured intersection states, then uses the frozen critic to score candidate language-model actions and optimize the policy with Group Relative Policy Optimization (GRPO). The resulting controller maps traffic states to interpretable reasoning traces and signal decisions while learning from dense per-state supervision rather than raw cumulative environment rewards. Experiments on TSC benchmarks covering Jinan and Hangzhou show that DGLight is the strongest overall method among the compared LLM-based controllers, remains competitive with strong RL baselines, and transfers well to city datasets not used to fit the critic. Qualitative examples further show that the model's generated reasoning is interpretable and aligned with the chosen signal phase. The project code is available $\href{https://github.com/yyccbb/FYP_LLMTSC}{here}$.

Chenbo Yu• 2026

Related benchmarks

TaskDatasetResultRank
Traffic Signal ControlJinan-2
Average Travel Time (ATT)269.7
52
Traffic Signal ControlJinan-1
Avg Travel Time (ATT)275.9
42
Traffic Signal ControlHangzhou (HZ-1)
Average Travel Time (ATT)316.6
28
Traffic Signal ControlJinan (JN-3)
Average Travel Time (ATT)262.2
26
Traffic Signal ControlHangzhou-2
Average Travel Time (ATT)333.1
6
Showing 5 of 5 rows

Other info

Follow for update