Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SleeperNets: Universal Backdoor Poisoning Attacks Against Reinforcement Learning Agents

About

Reinforcement learning (RL) is an actively growing field that is seeing increased usage in real-world, safety-critical applications -- making it paramount to ensure the robustness of RL algorithms against adversarial attacks. In this work we explore a particularly stealthy form of training-time attacks against RL -- backdoor poisoning. Here the adversary intercepts the training of an RL agent with the goal of reliably inducing a particular action when the agent observes a pre-determined trigger at inference time. We uncover theoretical limitations of prior work by proving their inability to generalize across domains and MDPs. Motivated by this, we formulate a novel poisoning attack framework which interlinks the adversary's objectives with those of finding an optimal policy -- guaranteeing attack success in the limit. Using insights from our theoretical analysis we develop ``SleeperNets'' as a universal backdoor attack which exploits a newly proposed threat model and leverages dynamic reward poisoning techniques. We evaluate our attack in 6 environments spanning multiple domains and demonstrate significant improvements in attack success over existing methods, while preserving benign episodic return.

Ethan Rathbun, Christopher Amato, Alina Oprea• 2024

Related benchmarks

TaskDatasetResultRank
Robot navigationTurtleBot3 (real-world deployment)
CSR (%)88.7
10
Backdoor Attack on Reinforcement LearningFrogger Discrete (evaluation)
Baseline Performance476.6
5
Backdoor Attack on Reinforcement LearningBreakout Discrete (evaluation)
Baseline Reward489.6
5
Backdoor Attack on Reinforcement LearningPacman Discrete (evaluation)
Backdoor Rate (BR)525.3
5
Backdoor Attack on Reinforcement LearningQ*bert Discrete (evaluation)
BR1.72e+4
5
Robotic NavigationSafety Gymnasium Safety Car
ASR100
3
Robotic NavigationCar Racing Box2D Gymnasium
Success Rate (ASR)100
3
Self DrivingHighway Env Merge
ASR100
3
Stock TradingTrade BTC Gym Trading Env
ASR1
3
Video Game PlayingBreakout Atari Gymnasium
ASR100
3
Showing 10 of 11 rows

Other info

Follow for update