Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Robust Deep Reinforcement Learning against Adversarial Behavior Manipulation

About

This study investigates behavior-targeted attacks on reinforcement learning and their countermeasures. Behavior-targeted attacks aim to manipulate the victim's behavior as desired by the adversary through adversarial interventions in state observations. Existing behavior-targeted attacks have some limitations, such as requiring white-box access to the victim's policy. To address this, we propose a novel attack method using imitation learning from adversarial demonstrations, which works under limited access to the victim's policy and is environment-agnostic. In addition, our theoretical analysis proves that the policy's sensitivity to state changes impacts defense performance, particularly in the early stages of the trajectory. Based on this insight, we propose time-discounted regularization, which enhances robustness against attacks while maintaining task performance. To the best of our knowledge, this is the first defense strategy specifically designed for behavior-targeted attacks.

Shojiro Yamabe, Kazuto Fukuchi, Jun Sakuma• 2024

Related benchmarks

TaskDatasetResultRank
Drawer-CloseMeta-World v2 (test)
Best Attack Reward4.86e+3
7
door-lockMeta-World v2 (test)
Best Attack Reward487
7
faucet-closeMeta-World v2 (test)
Best Attack Reward1.79e+3
7
faucet-openMeta-World v2 (test)
Best Attack Reward1.94e+3
7
handle-press-sideMeta-World v2 (test)
Best Attack Reward1.93e+3
7
door-unlockMeta-World v2 (test)
Best Attack Reward691
7
Drawer-OpenMeta-World v2 (test)
Best Attack Reward378
7
handle-pull-sideMeta-World v2 (test)
Best Attack Reward7
7
window-closeMeta-World v2 (test)
Best Attack Reward482
7
window-openMeta-World v2 (test)
Best Attack Reward254
7
Showing 10 of 41 rows

Other info

Follow for update