Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces

About

Decision Transformer, a promising approach that applies Transformer architectures to reinforcement learning, relies on causal self-attention to model sequences of states, actions, and rewards. While this method has shown competitive results, this paper investigates the integration of the Mamba framework, known for its advanced capabilities in efficient and effective sequence modeling, into the Decision Transformer architecture, focusing on the potential performance enhancements in sequential decision-making tasks. Our study systematically evaluates this integration by conducting a series of experiments across various decision-making environments, comparing the modified Decision Transformer, Decision Mamba, with its traditional counterpart. This work contributes to the advancement of sequential decision-making models, suggesting that the architecture and training methodology of neural networks can significantly impact their performance in complex tasks, and highlighting the potential of Mamba as a valuable tool for improving the efficacy of Transformer-based models in reinforcement learning scenarios.

Toshihiro Ota• 2024

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL halfcheetah-medium-expert	Normalized Score93.5	169
Offline Reinforcement Learning	D4RL hopper-medium-expert	Normalized Score111.9	161
Offline Reinforcement Learning	D4RL walker2d-medium-expert	Normalized Score111.6	140
Offline Reinforcement Learning	D4RL Medium-Replay Hopper	Normalized Score89.1	109
Offline Reinforcement Learning	D4RL Medium HalfCheetah	Normalized Score43.8	105
Offline Reinforcement Learning	D4RL Medium Walker2d	Normalized Score80.3	104
Offline Reinforcement Learning	D4RL Medium-Replay HalfCheetah	Normalized Score40.8	97
Offline Reinforcement Learning	D4RL Medium Hopper	Normalized Score98.5	72
Offline Reinforcement Learning	D4RL walker2d medium-replay	Normalized Score72.5	62
Offline Reinforcement Learning	D4RL Medium-Replay Walker2d	Normalized Score79.3	52

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord