Emergency Preemption Without Online Exploration: A Decision Transformer Approach
About
Emergency vehicle (EV) response time is a critical determinant of survival outcomes, yet deployed signal preemption strategies remain reactive and uncontrollable. We propose a return-conditioned framework for emergency corridor optimization based on the Decision Transformer (DT). By casting corridor optimization as offline, return-conditioned sequence modeling, our approach (1) eliminates online environment interaction during policy learning, (2) enables dispatch-level urgency control through a single target-return scalar, and (3) extends to multi-agent settings via a Multi-Agent Decision Transformer (MADT) with graph attention for spatial coordination. On the LightSim simulator, DT reduces average EV travel time by 37.7% relative to fixed-timing preemption on a 4x4 grid (88.6 s vs. 142.3 s), achieving the lowest civilian delay (11.3 s/veh) and fewest EV stops (1.2) among all methods, including online RL baselines that require environment interaction. MADT further improves on larger grids, overtaking DT with 45.2% reduction on 8x8 via graph-attention coordination. Return conditioning produces a smooth dispatch interface: varying the target return from 100 to -400 trades EV travel time (72.4-138.2 s) against civilian delay (16.8-5.4 s/veh), requiring no retraining. A Constrained DT extension adds explicit civilian disruption budgets as a second control knob.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Traffic Signal Control | 4x4 grid (100 evaluation episodes) | ETT (s)88.6 | 9 | |
| EV Travel Time Optimization | Grid 4x4 100 episodes 1.0 | Estimated Travel Time (s)88.6 | 3 | |
| EV Travel Time Optimization | Grid 6x6 100 episodes 1.0 | Expected Travel Time (s)148.7 | 3 | |
| EV Travel Time Optimization | Grid 8x8 100 episodes 1.0 | Estimated Travel Time (s)226.3 | 3 |