Diffusion Modulation via Environment Mechanism Modeling for Planning
About
Diffusion models have shown promising capabilities in trajectory generation for planning in offline reinforcement learning (RL). However, conventional diffusion-based planning methods often fail to account for the fact that generating trajectories in RL requires unique consistency between transitions to ensure coherence in real environments. This oversight can result in considerable discrepancies between the generated trajectories and the underlying mechanisms of a real environment. To address this problem, we propose a novel diffusion-based planning method, termed as Diffusion Modulation via Environment Mechanism Modeling (DMEMM). DMEMM modulates diffusion model training by incorporating key RL environment mechanisms, particularly transition dynamics and reward functions. Experimental results demonstrate that DMEMM achieves state-of-the-art performance for planning with offline reinforcement learning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| hopper locomotion | D4RL hopper medium-replay | Normalized Score100.6 | 56 | |
| walker2d locomotion | D4RL walker2d medium-replay | Normalized Score85.8 | 53 | |
| Locomotion | D4RL walker2d-medium-expert | Normalized Score111.6 | 47 | |
| Locomotion | D4RL Halfcheetah medium | Normalized Score49.2 | 44 | |
| Locomotion | D4RL Walker2d medium | Normalized Score86.5 | 44 | |
| hopper locomotion | D4RL Hopper medium | Normalized Score101.2 | 38 | |
| hopper locomotion | D4RL hopper-medium-expert | Normalized Score115.9 | 38 | |
| Locomotion | D4RL halfcheetah-medium-expert | Normalized Score94.6 | 37 | |
| Locomotion | D4RL HalfCheetah Medium-Replay | Normalized Score0.461 | 33 | |
| Navigation | D4RL Maze2d-umaze | Normalized Return132.4 | 9 |