Recurrent Action Transformer with Memory
About
Transformers have become increasingly popular in offline reinforcement learning (RL) due to their ability to treat agent trajectories as sequences, reframing policy learning as a sequence modeling task. However, in partially observable environments (POMDPs), effective decision-making depends on retaining information about past events -- something that standard transformers struggle with due to the quadratic complexity of self-attention, which limits their context length. One solution to this problem is to extend transformers with memory mechanisms. We propose the Recurrent Action Transformer with Memory (RATE), a novel transformer-based architecture for offline RL that incorporates a recurrent memory mechanism designed to regulate information retention. We evaluate RATE across a diverse set of environments: memory-intensive tasks (ViZDoom-Two-Colors, T-Maze, Memory Maze, Minigrid-Memory, and POPGym), as well as standard Atari and MuJoCo benchmarks. Our comprehensive experiments demonstrate that RATE significantly improves performance in memory-dependent settings while remaining competitive on standard tasks across a broad range of baselines. These findings underscore the pivotal role of integrated memory mechanisms in offline RL and establish RATE as a unified, high-capacity architecture for effective decision-making over extended horizons. Code: https://sites.google.com/view/rate-model/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL halfcheetah-medium-expert | Normalized Score87.4 | 155 | |
| Offline Reinforcement Learning | D4RL hopper-medium-expert | Normalized Score112.5 | 153 | |
| Offline Reinforcement Learning | D4RL walker2d-medium-expert | Normalized Score108.7 | 124 | |
| Offline Reinforcement Learning | D4RL Medium-Replay Hopper | Normalized Score83.7 | 97 | |
| Offline Reinforcement Learning | D4RL Medium HalfCheetah | Normalized Score43.5 | 97 | |
| Offline Reinforcement Learning | D4RL Medium Walker2d | Normalized Score80.7 | 96 | |
| Offline Reinforcement Learning | D4RL Medium-Replay HalfCheetah | Normalized Score39 | 84 | |
| Offline Reinforcement Learning | D4RL Medium Hopper | Normalized Score77.4 | 64 | |
| Offline Reinforcement Learning | D4RL Medium-Replay Walker2d | Normalized Score73.7 | 42 | |
| Offline Reinforcement Learning | D4RL Gym locomotion v2 (various) | HalfCheetah Medium Score43.5 | 16 |