Delving into adversarial attacks on deep policies
About
Adversarial examples have been shown to exist for a variety of deep learning architectures. Deep reinforcement learning has shown promising results on training agent policies directly on raw inputs such as image pixels. In this paper we present a novel study into adversarial attacks on deep reinforcement learning polices. We compare the effectiveness of the attacks using adversarial examples vs. random noise. We present a novel method for reducing the number of times adversarial examples need to be injected for a successful attack, based on the value function. We further explore how re-training on random noise and FGSM perturbations affects the resilience against adversarial examples.
Jernej Kos, Dawn Song• 2017
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Adversarial Attack | Pong | Cumulative Reward20.21 | 80 | |
| Adversarial Attack | Seaquest | Cumulative Reward300.5 | 80 | |
| Cumulative Reward | Qbert | Cumulative Reward680.3 | 80 | |
| Cumulative Reward | Space Invaders | Cumulative Reward185.4 | 80 | |
| Adversarial Attack | Breakout Black-box discrete (test) | Cumulative Reward95.42 | 36 | |
| Adversarial Attack | Breakout White-box discrete (test) | Cumulative Reward38.79 | 36 |
Showing 6 of 6 rows