Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation
About
This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state observations. Existing behavior-targeted attacks have some limitations, such as requiring white-box access to the victim's policy. To address this, we propose a novel attack method using imitation learning from adversarial demonstrations, which works under limited access to the victim's policy and is environment-agnostic. In addition, our theoretical analysis proves that the policy's sensitivity to state changes impacts defense performance, particularly in the early stages of the trajectory. Based on this insight, we propose time-discounted regularization, which enhances robustness against attacks while maintaining task performance. To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Drawer-Close | Meta-World v2 (test) | Best Attack Reward4.86e+3 | 7 | |
| door-lock | Meta-World v2 (test) | Best Attack Reward487 | 7 | |
| faucet-close | Meta-World v2 (test) | Best Attack Reward1.79e+3 | 7 | |
| faucet-open | Meta-World v2 (test) | Best Attack Reward1.94e+3 | 7 | |
| handle-press-side | Meta-World v2 (test) | Best Attack Reward1.93e+3 | 7 | |
| door-unlock | Meta-World v2 (test) | Best Attack Reward691 | 7 | |
| Drawer-Open | Meta-World v2 (test) | Best Attack Reward378 | 7 | |
| handle-pull-side | Meta-World v2 (test) | Best Attack Reward7 | 7 | |
| window-close | Meta-World v2 (test) | Best Attack Reward482 | 7 | |
| window-open | Meta-World v2 (test) | Best Attack Reward254 | 7 |