Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation

About

This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state observations. Existing behavior-targeted attacks have some limitations, such as requiring white-box access to the victim's policy. To address this, we propose a novel attack method using imitation learning from adversarial demonstrations, which works under limited access to the victim's policy and is environment-agnostic. In addition, our theoretical analysis proves that the policy's sensitivity to state changes impacts defense performance, particularly in the early stages of the trajectory. Based on this insight, we propose time-discounted regularization, which enhances robustness against attacks while maintaining task performance. To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks.

Shojiro Yamabe, Kazuto Fukuchi, Jun Sakuma• 2024

Related benchmarks

Task	Dataset	Result
window-close	Meta-World window-close	ASR74	20
window-open	Meta-World window-open	ASR12	20
Drawer-Close	Meta-World v2 (test)	Best Attack Reward4.86e+3	7
door-lock	Meta-World v2 (test)	Best Attack Reward487	7
faucet-close	Meta-World v2 (test)	Best Attack Reward1.79e+3	7
faucet-open	Meta-World v2 (test)	Best Attack Reward1.94e+3	7
handle-press-side	Meta-World v2 (test)	Best Attack Reward1.93e+3	7
door-unlock	Meta-World v2 (test)	Best Attack Reward691	7
Drawer-Open	Meta-World v2 (test)	Best Attack Reward378	7
handle-pull-side	Meta-World v2 (test)	Best Attack Reward7	7

Showing 10 of 41 rows

Other info

Follow for update

@wizwand_team Discord