Advantage-based Temporal Attack in Reinforcement Learning

About

Extensive research demonstrates that Deep Reinforcement Learning (DRL) models are susceptible to adversarially constructed inputs (i.e., adversarial examples), which can mislead the agent to take suboptimal or unsafe actions. Recent methods improve attack effectiveness by leveraging future rewards to guide adversarial perturbation generation over sequential time steps (i.e., reward-based attacks). However, these methods are unable to capture dependencies between different time steps in the perturbation generation process, resulting in a weak temporal correlation between the current perturbation and previous perturbations.In this paper, we propose a novel method called Advantage-based Adversarial Transformer (AAT), which can generate adversarial examples with stronger temporal correlations (i.e., time-correlated adversarial examples) to improve the attack performance. AAT employs a multi-scale causal self-attention (MSCSA) mechanism to dynamically capture dependencies between historical information from different time periods and the current state, thus enhancing the correlation between the current perturbation and the previous perturbation. Moreover, AAT introduces a weighted advantage mechanism, which quantifies the effectiveness of a perturbation in a given state and guides the generation process toward high-performance adversarial examples by sampling high-advantage regions. Extensive experiments demonstrate that the performance of AAT matches or surpasses mainstream adversarial attack baselines on Atari, DeepMind Control Suite and Google football tasks.

Shenghong He• 2026

Related benchmarks

Task	Dataset	Result
Adversarial Attack	Seaquest	Cumulative Reward80.45	80
Adversarial Attack	Pong	Cumulative Reward-16.06	80
Cumulative Reward	Qbert	Cumulative Reward60.22	80
Cumulative Reward	Space Invaders	Cumulative Reward68.59	80
Adversarial Attack	Breakout White-box discrete (test)	Cumulative Reward8.21	36
Adversarial Attack	Breakout Black-box discrete (test)	Cumulative Reward28.32	36
Adversarial Detection	Pong Gym Atari (test)	--	2
Adversarial Detection	Squest Gym Atari (test)	--	2
Adversarial Detection	Qbert Gym Atari (test)	--	2

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord