Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy
About
The recent development of connected and automated vehicle (CAV) technologies has spurred investigations to optimize dense urban traffic to maximize vehicle speed and throughput. This paper explores advisory autonomy, in which real-time driving advisories are issued to the human drivers, thus achieving near-term performance of automated vehicles. Due to the complexity of traffic systems, recent studies of coordinating CAVs have resorted to leveraging deep reinforcement learning (RL). Coarse-grained advisory is formalized as zero-order holds, and we consider a range of hold duration from 0.1 to 40 seconds. However, despite the similarity of the higher frequency tasks on CAVs, a direct application of deep RL fails to be generalized to advisory autonomy tasks. To overcome this, we utilize zero-shot transfer, training policies on a set of source tasks--specific traffic scenarios with designated hold durations--and then evaluating the efficacy of these policies on different target tasks. We introduce Temporal Transfer Learning (TTL) algorithms to select source tasks for zero-shot transfer, systematically leveraging the temporal structure to solve the full range of tasks. TTL selects the most suitable source tasks to maximize the performance of the range of tasks. We validate our algorithms on diverse mixed-traffic scenarios, demonstrating that TTL more reliably solves the tasks than baselines. This paper underscores the potential of coarse-grained advisory autonomy with TTL in traffic flow optimization.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Traffic Signal Control | Traffic Signal Speed Limit variation | Normalized Reward88.74 | 6 | |
| Advisory autonomy | Advisory Autonomy Highway ramp (Acceleration guidance) | Normalized Reward0.657 | 6 | |
| Dynamic eco-driving | Eco-Driving Inflow variation | Normalized Reward0.5299 | 6 | |
| Advisory autonomy | Advisory Autonomy Single lane ring (Speed guidance) | Normalized Reward0.9819 | 6 | |
| Advisory autonomy | Advisory Autonomy Highway ramp (Speed guidance) | Normalized Reward64.61 | 6 | |
| Dynamic eco-driving | Eco-Driving Penetration Rate variation | Normalized Reward0.5992 | 6 | |
| Dynamic eco-driving | Eco-Driving Green Phase variation | Normalized Reward0.4678 | 6 | |
| Traffic Signal Control | Traffic Signal Inflow variation | Normalized Reward0.8682 | 6 | |
| Advisory autonomy | Advisory Autonomy Single lane ring (Acceleration guidance) | Normalized Reward90.21 | 6 | |
| Traffic Signal Control | Traffic Signal Road Length variation | Normalized Reward0.9349 | 6 |