Adversarially Robust Decision Transformer

About

Decision Transformer (DT), as one of the representative Reinforcement Learning via Supervised Learning (RvS) methods, has achieved strong performance in offline learning tasks by leveraging the powerful Transformer architecture for sequential decision-making. However, in adversarial environments, these methods can be non-robust, since the return is dependent on the strategies of both the decision-maker and adversary. Training a probabilistic model conditioned on observed return to predict action can fail to generalize, as the trajectories that achieve a return in the dataset might have done so due to a suboptimal behavior adversary. To address this, we propose a worst-case-aware RvS algorithm, the Adversarially Robust Decision Transformer (ARDT), which learns and conditions the policy on in-sample minimax returns-to-go. ARDT aligns the target return with the worst-case return learned through minimax expectile regression, thereby enhancing robustness against powerful test-time adversaries. In experiments conducted on sequential games with full data coverage, ARDT can generate a maximin (Nash Equilibrium) strategy, the solution with the largest adversarial robustness. In large-scale sequential games and continuous adversarial RL environments with partial data coverage, ARDT demonstrates significantly superior robustness to powerful test-time adversaries and attains higher worst-case returns compared to contemporary DT methods.

Xiaohang Tang, Afonso Marques, Parameswaran Kamalaruban, Ilija Bogunovic• 2024

Related benchmarks

Task	Dataset	Result
Adversarial Reinforcement Learning	Connect Four 30% optimal adversary (test-time)	Average Return0.55	3
Adversarial Reinforcement Learning	Connect Four 50% optimal adversary (test-time)	Average Return0.11	3
Adversarial Reinforcement Learning	Connect Four 70% optimal adversary (test)	Average Return0.02	3
Noisy Action Robust MDP	Hopper MuJoCo Noisy Action Robust MDP (low randomness)	Worst-case Return477.9	3
Noisy Action Robust MDP	Hopper MuJoCo Noisy Action Robust MDP medium randomness	Worst-case Return482.2	3
Noisy Action Robust MDP	Hopper high randomness MuJoCo Noisy Action Robust MDP	Worst-case Return331.7	3
Noisy Action Robust MDP	Walker2D medium randomness MuJoCo Noisy Action Robust MDP	Worst-case Return508.4	3
Noisy Action Robust MDP	Halfcheetah MuJoCo Noisy Action Robust MDP (high randomness)	Worst-case Return1.77e+3	3
Noisy Action Robust MDP	Walker2D low randomness MuJoCo Noisy Action Robust MDP	Worst-case Return405.6	3
Noisy Action Robust MDP	Walker2D high randomness MuJoCo Noisy Action Robust MDP	Worst-case Return492.6	3

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord