DGLight: DQN-Guided GRPO Fine-Tuning of Large Language Models for Traffic Signal Control

About

Traffic signal control (TSC) plays a central role in reducing congestion and maintaining urban mobility. This dissertation introduces DGLight, a critic-guided reinforcement-learning framework for adapting a pretrained large language model to TSC. DGLight first trains a CoLight-based Deep Q-Network critic to estimate traffic-aware action values from structured intersection states, then uses the frozen critic to score candidate language-model actions and optimize the policy with Group Relative Policy Optimization (GRPO). The resulting controller maps traffic states to interpretable reasoning traces and signal decisions while learning from dense per-state supervision rather than raw cumulative environment rewards. Experiments on TSC benchmarks covering Jinan and Hangzhou show that DGLight is the strongest overall method among the compared LLM-based controllers, remains competitive with strong RL baselines, and transfers well to city datasets not used to fit the critic. Qualitative examples further show that the model's generated reasoning is interpretable and aligned with the chosen signal phase. The project code is available $\href{https://github.com/yyccbb/FYP_LLMTSC}{here}$.

Chenbo Yu• 2026

Related benchmarks

Task	Dataset	Result
Traffic Signal Control	Jinan-2	Average Travel Time (ATT)269.7	52
Traffic Signal Control	Jinan-1	Avg Travel Time (ATT)275.9	42
Traffic Signal Control	Hangzhou (HZ-1)	Average Travel Time (ATT)316.6	28
Traffic Signal Control	Jinan (JN-3)	Average Travel Time (ATT)262.2	26
Traffic Signal Control	Hangzhou-2	Average Travel Time (ATT)333.1	6

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord