Emergency Preemption Without Online Exploration: A Decision Transformer Approach

About

Emergency vehicle (EV) response time is a critical determinant of survival outcomes, yet deployed signal preemption strategies remain reactive and uncontrollable. We propose a return-conditioned framework for emergency corridor optimization based on the Decision Transformer (DT). By casting corridor optimization as offline, return-conditioned sequence modeling, our approach (1) eliminates online environment interaction during policy learning, (2) enables dispatch-level urgency control through a single target-return scalar, and (3) extends to multi-agent settings via a Multi-Agent Decision Transformer (MADT) with graph attention for spatial coordination. On the LightSim simulator, DT reduces average EV travel time by 37.7% relative to fixed-timing preemption on a 4x4 grid (88.6 s vs. 142.3 s), achieving the lowest civilian delay (11.3 s/veh) and fewest EV stops (1.2) among all methods, including online RL baselines that require environment interaction. MADT further improves on larger grids, overtaking DT with 45.2% reduction on 8x8 via graph-attention coordination. Return conditioning produces a smooth dispatch interface: varying the target return from 100 to -400 trades EV travel time (72.4-138.2 s) against civilian delay (16.8-5.4 s/veh), requiring no retraining. A Constrained DT extension adds explicit civilian disruption budgets as a second control knob.

Haoran Su, Hanxiao Deng, Yandong Sun• 2026

Related benchmarks

Task	Dataset	Result
Traffic Signal Control	4x4 grid (100 evaluation episodes)	ETT (s)88.6	9
EV Travel Time Optimization	Grid 4x4 100 episodes 1.0	Estimated Travel Time (s)88.6	3
EV Travel Time Optimization	Grid 6x6 100 episodes 1.0	Expected Travel Time (s)148.7	3
EV Travel Time Optimization	Grid 8x8 100 episodes 1.0	Estimated Travel Time (s)226.3	3

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord