Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Recurrent Action Transformer with Memory

About

Transformers have become increasingly popular in offline reinforcement learning (RL) due to their ability to treat agent trajectories as sequences, reframing policy learning as a sequence modeling task. However, in partially observable environments (POMDPs), effective decision-making depends on retaining information about past events -- something that standard transformers struggle with due to the quadratic complexity of self-attention, which limits their context length. One solution to this problem is to extend transformers with memory mechanisms. We propose the Recurrent Action Transformer with Memory (RATE), a novel transformer-based architecture for offline RL that incorporates a recurrent memory mechanism designed to regulate information retention. We evaluate RATE across a diverse set of environments: memory-intensive tasks (ViZDoom-Two-Colors, T-Maze, Memory Maze, Minigrid-Memory, and POPGym), as well as standard Atari and MuJoCo benchmarks. Our comprehensive experiments demonstrate that RATE significantly improves performance in memory-dependent settings while remaining competitive on standard tasks across a broad range of baselines. These findings underscore the pivotal role of integrated memory mechanisms in offline RL and establish RATE as a unified, high-capacity architecture for effective decision-making over extended horizons. Code: https://sites.google.com/view/rate-model/.

Egor Cherepanov, Alexey Staroverov, Alexey K. Kovalev, Aleksandr I. Panov• 2023

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score87.4
155
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score112.5
153
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score108.7
124
Offline Reinforcement LearningD4RL Medium-Replay Hopper
Normalized Score83.7
97
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score43.5
97
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score80.7
96
Offline Reinforcement LearningD4RL Medium-Replay HalfCheetah
Normalized Score39
84
Offline Reinforcement LearningD4RL Medium Hopper
Normalized Score77.4
64
Offline Reinforcement LearningD4RL Medium-Replay Walker2d
Normalized Score73.7
42
Offline Reinforcement LearningD4RL Gym locomotion v2 (various)
HalfCheetah Medium Score43.5
16
Showing 10 of 25 rows

Other info

Follow for update