Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Evolutionary Discovery of Heuristic Policies for Traffic Signal Control

About

Traffic Signal Control (TSC) involves a challenging trade-off: classic heuristics are efficient but oversimplified, while Deep Reinforcement Learning (DRL) achieves high performance yet suffers from poor generalization and opaque policies. Online Large Language Models (LLMs) provide general reasoning but incur high latency and lack environment-specific optimization. To address these issues, we propose Temporal Policy Evolution for Traffic (\textbf{\method{}}), which uses LLMs as an evolution engine to derive specialized heuristic policies. The framework introduces two key modules: (1) Structured State Abstraction (SSA), converting high-dimensional traffic data into temporal-logical facts for reasoning; and (2) Credit Assignment Feedback (CAF), tracing flawed micro-decisions to poor macro-outcomes for targeted critique. Operating entirely at the prompt level without training, \method{} yields lightweight, robust policies optimized for specific traffic environments, outperforming both heuristics and online LLM actors.

Ruibing Wang, Shuhan Guo, Zeen Li, Zhen Wang, Quanming Yao• 2025

Related benchmarks

TaskDatasetResultRank
Traffic Signal ControlHangzhou
ATT (Avg Travel Time)313.6
14
Traffic Signal ControlJinan-1
Avg Travel Time (ATT)265.6
14
Traffic Signal ControlJinan-2
Average Travel Time (ATT)271.7
14
Showing 3 of 3 rows

Other info

Follow for update