Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces
About
Decision Transformer, a promising approach that applies Transformer architectures to reinforcement learning, relies on causal self-attention to model sequences of states, actions, and rewards. While this method has shown competitive results, this paper investigates the integration of the Mamba framework, known for its advanced capabilities in efficient and effective sequence modeling, into the Decision Transformer architecture, focusing on the potential performance enhancements in sequential decision-making tasks. Our study systematically evaluates this integration by conducting a series of experiments across various decision-making environments, comparing the modified Decision Transformer, Decision Mamba, with its traditional counterpart. This work contributes to the advancement of sequential decision-making models, suggesting that the architecture and training methodology of neural networks can significantly impact their performance in complex tasks, and highlighting the potential of Mamba as a valuable tool for improving the efficacy of Transformer-based models in reinforcement learning scenarios.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL halfcheetah-medium-expert | Normalized Score93.5 | 155 | |
| Offline Reinforcement Learning | D4RL hopper-medium-expert | Normalized Score111.9 | 153 | |
| Offline Reinforcement Learning | D4RL walker2d-medium-expert | Normalized Score111.6 | 124 | |
| Offline Reinforcement Learning | D4RL Medium-Replay Hopper | Normalized Score89.1 | 97 | |
| Offline Reinforcement Learning | D4RL Medium HalfCheetah | Normalized Score43.8 | 97 | |
| Offline Reinforcement Learning | D4RL Medium Walker2d | Normalized Score80.3 | 96 | |
| Offline Reinforcement Learning | D4RL Medium-Replay HalfCheetah | Normalized Score40.8 | 84 | |
| Offline Reinforcement Learning | D4RL Medium Hopper | Normalized Score98.5 | 64 | |
| Offline Reinforcement Learning | D4RL walker2d medium-replay | Normalized Score72.5 | 62 | |
| Offline Reinforcement Learning | D4RL Medium-Replay Walker2d | Normalized Score79.3 | 42 |