HALO: Hierarchical Reinforcement Learning for Large-Scale Adaptive Traffic Signal Control
About
Adaptive traffic signal control (ATSC) is essential for mitigating urban congestion in modern smart cities, where traffic infrastructure is evolving into interconnected Web-of-Things (WoT) environments with thousands of sensing-and-control nodes. However, existing methods face a critical scalability-coordination tradeoff: centralized approaches optimize global objectives but become computationally intractable at city scale, while decentralized multi-agent methods scale efficiently yet lack network-level coherence, resulting in suboptimal performance. In this paper, we present HALO, a hierarchical reinforcement learning framework that addresses this tradeoff for large-scale ATSC. HALO decouples decision-making into two levels: a high-level global guidance policy employs Transformer-LSTM encoders to model spatio-temporal dependencies across the entire network and broadcast compact guidance signals, while low-level local intersection policies execute decentralized control conditioned on both local observations and global context. To ensure better alignment of global-local objectives, we introduce an adversarial goal-setting mechanism where the global policy proposes challenging-yet-feasible network-level targets that local policies are trained to surpass, fostering robust coordination. We evaluate HALO extensively on multiple standard benchmarks, and a newly constructed large-scale Manhattan-like network with 2,668 intersections under real-world traffic patterns, including peak transitions, adverse weather and holiday surges. Results demonstrate HALO shows competitive performance and becomes increasingly dominant as network complexity grows across small-scale benchmarks, while delivering the strongest performance in all large-scale regimes, offering up to 6.8% lower average travel time and 5.0% lower average delay than the best state-of-the-art.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Adaptive Traffic Signal Control | Grid5x5 | Average Trip Time (s)204.3 | 20 | |
| Adaptive Traffic Signal Control | Grid4x4 | Average Trip Time (s)159.1 | 12 | |
| Adaptive Traffic Signal Control | Manhattan2668 | Avg Trip Time (s)861.6 | 12 | |
| Adaptive Traffic Signal Control | Arterial4x4 | Avg Trip Time (s)341.4 | 12 | |
| Adaptive Traffic Signal Control | Ingolstadt21 | Average Trip Time (s)272.5 | 12 | |
| Adaptive Traffic Signal Control | Manhattan2668 Peak Transition | Average Trip Time (s)690.9 | 12 | |
| Adaptive Traffic Signal Control | Manhattan2668 Adverse Weather | Average Trip Time (s)913.8 | 12 | |
| Adaptive Traffic Signal Control | Manhattan2668 (Holiday Rush) | Average Trip Time (seconds)980.2 | 12 | |
| Adaptive Traffic Signal Control | Cologne8 | Average Trip Time (s)90.83 | 12 |