SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents
About
Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications -- making it paramount to ensure the robustness of RL algorithms against adversarial attacks. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. Here the adversary intercepts the training of an RL agent with the goal of reliably inducing a particular action when the agent observes a pre-determined trigger at inference time. We uncover theoretical limitations of prior work by proving their inability to generalize across domains and MDPs. Motivated by this, we formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy -- guaranteeing attack success in the limit. Using insights from our theoretical analysis we develop ``SleeperNets'' as a universal backdoor attack which exploits a newly proposed threat model and leverages dynamic reward poisoning techniques. We evaluate our attack in 6 environments spanning multiple domains and demonstrate significant improvements in attack success over existing methods, while preserving benign episodic return.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot navigation | TurtleBot3 (real-world deployment) | CSR (%)88.7 | 10 | |
| Backdoor Attack on Reinforcement Learning | Frogger Discrete (evaluation) | Baseline Performance476.6 | 5 | |
| Backdoor Attack on Reinforcement Learning | Breakout Discrete (evaluation) | Baseline Reward489.6 | 5 | |
| Backdoor Attack on Reinforcement Learning | Pacman Discrete (evaluation) | Backdoor Rate (BR)525.3 | 5 | |
| Backdoor Attack on Reinforcement Learning | Q*bert Discrete (evaluation) | BR1.72e+4 | 5 | |
| Robotic Navigation | Safety Gymnasium Safety Car | ASR100 | 3 | |
| Robotic Navigation | Car Racing Box2D Gymnasium | Success Rate (ASR)100 | 3 | |
| Self Driving | Highway Env Merge | ASR100 | 3 | |
| Stock Trading | Trade BTC Gym Trading Env | ASR1 | 3 | |
| Video Game Playing | Breakout Atari Gymnasium | ASR100 | 3 |