Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces

About

Decision Transformer, a promising approach that applies Transformer architectures to reinforcement learning, relies on causal self-attention to model sequences of states, actions, and rewards. While this method has shown competitive results, this paper investigates the integration of the Mamba framework, known for its advanced capabilities in efficient and effective sequence modeling, into the Decision Transformer architecture, focusing on the potential performance enhancements in sequential decision-making tasks. Our study systematically evaluates this integration by conducting a series of experiments across various decision-making environments, comparing the modified Decision Transformer, Decision Mamba, with its traditional counterpart. This work contributes to the advancement of sequential decision-making models, suggesting that the architecture and training methodology of neural networks can significantly impact their performance in complex tasks, and highlighting the potential of Mamba as a valuable tool for improving the efficacy of Transformer-based models in reinforcement learning scenarios.

Toshihiro Ota• 2024

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score90.6
117
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score111
115
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score108.3
86
Offline Reinforcement LearningD4RL Medium-Replay Hopper
Normalized Score81.7
72
Offline Reinforcement LearningD4RL Medium-Replay HalfCheetah
Normalized Score39.8
59
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score42.8
59
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score77.6
58
Offline Reinforcement LearningD4RL walker2d medium-replay
Normalized Score72.5
45
Offline Reinforcement LearningD4RL Hopper Medium v2
Normalized Score86.2
26
Offline multitask Reinforcement LearningFranka Kitchen kitchen-mixed
Average Episodic Return59.3
23
Showing 10 of 15 rows

Other info

Follow for update