Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Advantage-based Temporal Attack in Reinforcement Learning

About

Extensive research demonstrates that Deep Reinforcement Learning (DRL) models are susceptible to adversarially constructed inputs (i.e., adversarial examples), which can mislead the agent to take suboptimal or unsafe actions. Recent methods improve attack effectiveness by leveraging future rewards to guide adversarial perturbation generation over sequential time steps (i.e., reward-based attacks). However, these methods are unable to capture dependencies between different time steps in the perturbation generation process, resulting in a weak temporal correlation between the current perturbation and previous perturbations.In this paper, we propose a novel method called Advantage-based Adversarial Transformer (AAT), which can generate adversarial examples with stronger temporal correlations (i.e., time-correlated adversarial examples) to improve the attack performance. AAT employs a multi-scale causal self-attention (MSCSA) mechanism to dynamically capture dependencies between historical information from different time periods and the current state, thus enhancing the correlation between the current perturbation and the previous perturbation. Moreover, AAT introduces a weighted advantage mechanism, which quantifies the effectiveness of a perturbation in a given state and guides the generation process toward high-performance adversarial examples by sampling high-advantage regions. Extensive experiments demonstrate that the performance of AAT matches or surpasses mainstream adversarial attack baselines on Atari, DeepMind Control Suite and Google football tasks.

Shenghong He• 2026

Related benchmarks

TaskDatasetResultRank
Adversarial AttackSeaquest
Cumulative Reward80.45
80
Adversarial AttackPong
Cumulative Reward-16.06
80
Cumulative RewardQbert
Cumulative Reward60.22
80
Cumulative RewardSpace Invaders
Cumulative Reward68.59
80
Adversarial AttackBreakout White-box discrete (test)
Cumulative Reward8.21
36
Adversarial AttackBreakout Black-box discrete (test)
Cumulative Reward28.32
36
Adversarial DetectionPong Gym Atari (test)--
2
Adversarial DetectionSquest Gym Atari (test)--
2
Adversarial DetectionQbert Gym Atari (test)--
2
Showing 9 of 9 rows

Other info

Follow for update