Recurrent Action Transformer with Memory

About

Transformers have become increasingly popular in offline reinforcement learning (RL) due to their ability to treat agent trajectories as sequences, reframing policy learning as a sequence modeling task. However, in partially observable environments (POMDPs), effective decision-making depends on retaining information about past events -- something that standard transformers struggle with due to the quadratic complexity of self-attention, which limits their context length. One solution to this problem is to extend transformers with memory mechanisms. We propose the Recurrent Action Transformer with Memory (RATE), a novel transformer-based architecture for offline RL that incorporates a recurrent memory mechanism designed to regulate information retention. We evaluate RATE across a diverse set of environments: memory-intensive tasks (ViZDoom-Two-Colors, T-Maze, Memory Maze, Minigrid-Memory, and POPGym), as well as standard Atari and MuJoCo benchmarks. Our comprehensive experiments demonstrate that RATE significantly improves performance in memory-dependent settings while remaining competitive on standard tasks across a broad range of baselines. These findings underscore the pivotal role of integrated memory mechanisms in offline RL and establish RATE as a unified, high-capacity architecture for effective decision-making over extended horizons. Code: https://sites.google.com/view/rate-model/.

Egor Cherepanov, Alexey Staroverov, Alexey K. Kovalev, Aleksandr I. Panov• 2023

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL halfcheetah-medium-expert	Normalized Score87.4	169
Offline Reinforcement Learning	D4RL hopper-medium-expert	Normalized Score112.5	161
Offline Reinforcement Learning	D4RL walker2d-medium-expert	Normalized Score108.7	132
Offline Reinforcement Learning	D4RL Medium-Replay Hopper	Normalized Score83.7	109
Offline Reinforcement Learning	D4RL Medium HalfCheetah	Normalized Score43.5	105
Offline Reinforcement Learning	D4RL Medium Walker2d	Normalized Score80.7	104
Offline Reinforcement Learning	D4RL Medium-Replay HalfCheetah	Normalized Score39	97
Offline Reinforcement Learning	D4RL Medium Hopper	Normalized Score77.4	72
Offline Reinforcement Learning	D4RL Medium-Replay Walker2d	Normalized Score73.7	52
Offline Reinforcement Learning	D4RL Gym locomotion v2 (various)	HalfCheetah Medium Score43.5	16

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord