Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

About

Mamba-based models have drawn much attention in offline RL. However, their selective mechanism often detrimental when key steps in RL sequences are omitted. To address these issues, we propose a simple yet effective structure, called Decision MetaMamba (DMM), which replaces Mamba's token mixer with a dense layer-based sequence mixer and modifies positional structure to preserve local information. By performing sequence mixing that considers all channels simultaneously before Mamba, DMM prevents information loss due to selective scanning and residual gating. Extensive experiments demonstrate that our DMM delivers the state-of-the-art performance across diverse RL tasks. Furthermore, DMM achieves these results with a compact parameter footprint, demonstrating strong potential for real-world applications.

Wall Kim, Chaeyoung Song, Hanul Kim• 2026

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	D4RL halfcheetah-medium-expert	Normalized Score92.6	169
Offline Reinforcement Learning	D4RL hopper-medium-expert	Normalized Score110.2	161
Offline Reinforcement Learning	D4RL walker2d-medium-expert	Normalized Score110.2	132
Offline Reinforcement Learning	D4RL Medium-Replay Hopper	Normalized Score95	109
Offline Reinforcement Learning	D4RL Medium HalfCheetah	Normalized Score43	105
Offline Reinforcement Learning	D4RL Medium Walker2d	Normalized Score83.6	104
Offline Reinforcement Learning	D4RL Medium-Replay HalfCheetah	Normalized Score41.1	97
Offline Reinforcement Learning	D4RL walker2d medium-replay	Normalized Score74.1	62
Offline Reinforcement Learning	D4RL Hopper Medium v2	Normalized Score96.2	36
Offline multitask Reinforcement Learning	Franka Kitchen kitchen-mixed	Average Episodic Return83	23

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord