Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing
About
Mamba-based models have drawn much attention in offline RL. However, their selective mechanism often detrimental when key steps in RL sequences are omitted. To address these issues, we propose a simple yet effective structure, called Decision MetaMamba (DMM), which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information. By performing sequence mixing that considers all channels simultaneously before Mamba, DMM prevents information loss due to selective scanning and residual gating. Extensive experiments demonstrate that our DMM delivers the state-of-the-art performance across diverse RL tasks. Furthermore, DMM achieves these results with a compact parameter footprint, demonstrating strong potential for real-world applications.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL halfcheetah-medium-expert | Normalized Score92.6 | 117 | |
| Offline Reinforcement Learning | D4RL hopper-medium-expert | Normalized Score110.2 | 115 | |
| Offline Reinforcement Learning | D4RL walker2d-medium-expert | Normalized Score110.2 | 86 | |
| Offline Reinforcement Learning | D4RL Medium-Replay Hopper | Normalized Score95 | 72 | |
| Offline Reinforcement Learning | D4RL Medium-Replay HalfCheetah | Normalized Score41.1 | 59 | |
| Offline Reinforcement Learning | D4RL Medium HalfCheetah | Normalized Score43 | 59 | |
| Offline Reinforcement Learning | D4RL Medium Walker2d | Normalized Score83.6 | 58 | |
| Offline Reinforcement Learning | D4RL walker2d medium-replay | Normalized Score74.1 | 45 | |
| Offline Reinforcement Learning | D4RL Hopper Medium v2 | Normalized Score96.2 | 26 | |
| Offline multitask Reinforcement Learning | Franka Kitchen kitchen-mixed | Average Episodic Return83 | 23 |