Evolutionary Discovery of Heuristic Policies for Traffic Signal Control

About

Traffic Signal Control (TSC) involves a challenging trade-off: classic heuristics are efficient but oversimplified, while Deep Reinforcement Learning (DRL) achieves high performance yet suffers from poor generalization and opaque policies. Online Large Language Models (LLMs) provide general reasoning but incur high latency and lack environment-specific optimization. To address these issues, we propose Temporal Policy Evolution for Traffic (\textbf{\method{}}), which uses LLMs as an evolution engine to derive specialized heuristic policies. The framework introduces two key modules: (1) Structured State Abstraction (SSA), converting high-dimensional traffic data into temporal-logical facts for reasoning; and (2) Credit Assignment Feedback (CAF), tracing flawed micro-decisions to poor macro-outcomes for targeted critique. Operating entirely at the prompt level without training, \method{} yields lightweight, robust policies optimized for specific traffic environments, outperforming both heuristics and online LLM actors.

Ruibing Wang, Shuhan Guo, Zeen Li, Zhen Wang, Quanming Yao• 2025

Related benchmarks

Task	Dataset	Result
Traffic Signal Control	Jinan-2	Average Travel Time (ATT)271.7	52
Traffic Signal Control	Jinan-1	Avg Travel Time (ATT)265.6	42
Traffic Signal Control	Hangzhou	ATT (Avg Travel Time)313.6	14

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord