X-Light: Cross-City Traffic Signal Control Using Transformer on Transformer as Meta Multi-Agent Reinforcement Learner
About
The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model for cross-city meta multi-agent traffic signal control, named as X-Light: We input the full Markov Decision Process trajectories, and the Lower Transformer aggregates the states, actions, rewards among the target intersection and its neighbors within a city, and the Upper Transformer learns the general decision trajectories across different cities. This dual-level approach bolsters the model's robust generalization and transferability. Notably, when directly transferring to unseen scenarios, ours surpasses all baseline methods with +7.91% on average, and even +16.3% in some cases, yielding the best results.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Adaptive Traffic Signal Control | Grid5x5 | Average Trip Time (s)220.6 | 20 | |
| Adaptive Traffic Signal Control | Cologne8 | Average Trip Time (s)88.55 | 12 | |
| Adaptive Traffic Signal Control | Arterial4x4 | Avg Trip Time (s)349.6 | 12 | |
| Adaptive Traffic Signal Control | Ingolstadt21 | Average Trip Time (s)278.1 | 12 | |
| Adaptive Traffic Signal Control | Manhattan2668 (Holiday Rush) | Average Trip Time (seconds)1.05e+3 | 12 | |
| Adaptive Traffic Signal Control | Grid4x4 | Average Trip Time (s)162.5 | 12 | |
| Adaptive Traffic Signal Control | Manhattan2668 | Avg Trip Time (s)999.6 | 12 | |
| Adaptive Traffic Signal Control | Manhattan2668 Peak Transition | Average Trip Time (s)843.4 | 12 | |
| Adaptive Traffic Signal Control | Manhattan2668 Adverse Weather | Average Trip Time (s)1.10e+3 | 12 |