Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

About

Mamba-based models have drawn much attention in offline RL. However, their selective mechanism often detrimental when key steps in RL sequences are omitted. To address these issues, we propose a simple yet effective structure, called Decision MetaMamba (DMM), which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information. By performing sequence mixing that considers all channels simultaneously before Mamba, DMM prevents information loss due to selective scanning and residual gating. Extensive experiments demonstrate that our DMM delivers the state-of-the-art performance across diverse RL tasks. Furthermore, DMM achieves these results with a compact parameter footprint, demonstrating strong potential for real-world applications.

Wall Kim, Chaeyoung Song, Hanul Kim• 2026

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score92.6
117
Offline Reinforcement LearningD4RL hopper-medium-expert
Normalized Score110.2
115
Offline Reinforcement LearningD4RL walker2d-medium-expert
Normalized Score110.2
86
Offline Reinforcement LearningD4RL Medium-Replay Hopper
Normalized Score95
72
Offline Reinforcement LearningD4RL Medium-Replay HalfCheetah
Normalized Score41.1
59
Offline Reinforcement LearningD4RL Medium HalfCheetah
Normalized Score43
59
Offline Reinforcement LearningD4RL Medium Walker2d
Normalized Score83.6
58
Offline Reinforcement LearningD4RL walker2d medium-replay
Normalized Score74.1
45
Offline Reinforcement LearningD4RL Hopper Medium v2
Normalized Score96.2
26
Offline multitask Reinforcement LearningFranka Kitchen kitchen-mixed
Average Episodic Return83
23
Showing 10 of 15 rows

Other info

Follow for update