Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BadRL: Sparse Targeted Backdoor Attack Against Reinforcement Learning

About

Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection. In contrast to the previous methods that utilize sample-agnostic trigger patterns, BadRL dynamically generates distinct trigger patterns based on targeted state observations, thereby enhancing its effectiveness. Theoretical analysis shows that the targeted backdoor attack is always viable and remains stealthy under specific assumptions. Empirical results on various classic RL tasks illustrate that BadRL can substantially degrade the performance of a victim agent with minimal poisoning efforts 0.003% of total training steps) during training and infrequent attacks during testing.

Jing Cui, Yufei Han, Yuzhe Ma, Jianbin Jiao, Junge Zhang• 2023

Related benchmarks

TaskDatasetResultRank
Robot navigationTurtleBot3 (real-world deployment)
CSR (%)88.2
10
Self DrivingHighway Env Merge
ASR57.3
3
Robotic NavigationSafety Gymnasium Safety Car
ASR70.6
3
Robotic NavigationCar Racing Box2D Gymnasium
Success Rate (ASR)44.1
3
Stock TradingTrade BTC Gym Trading Env
ASR0.388
3
Video Game PlayingBreakout Atari Gymnasium
ASR99.3
3
Video Game PlayingQbert Atari Gymnasium
ASR47.2
3
Showing 7 of 7 rows

Other info

Follow for update