Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making
About
Offline decision-making via diffusion models often produces trajectories that are misaligned with system dynamics, limiting their reliability for control. We propose Model Predictive Diffuser (MPDiffuser), a compositional diffusion framework that combines a diffusion planner with a dynamics diffusion model to generate task-aligned and dynamically plausible trajectories. MPDiffuser interleaves planner and dynamics updates during sampling, progressively correcting feasibility while preserving task intent. A lightweight ranking module then selects trajectories that best satisfy task objectives. The compositional design improves sample efficiency and adaptability by enabling the dynamics model to leverage diverse and previously unseen data independently of the planner. Empirically, we demonstrate consistent improvements over prior diffusion-based methods on unconstrained (D4RL) and constrained (DSRL) benchmarks, and validate practicality through deployment on a real quadrupedal robot.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL halfcheetah-medium-expert | Normalized Score98.4 | 117 | |
| Offline Reinforcement Learning | D4RL hopper-medium-expert | Normalized Score110.4 | 115 | |
| Offline Reinforcement Learning | D4RL walker2d-medium-expert | Normalized Score110.7 | 86 | |
| Offline Reinforcement Learning | D4RL Medium-Replay Hopper | Normalized Score98.3 | 72 | |
| Offline Reinforcement Learning | D4RL Medium HalfCheetah | Normalized Score47.9 | 59 | |
| Offline Reinforcement Learning | D4RL Medium-Replay HalfCheetah | Normalized Score43.5 | 59 | |
| Offline Reinforcement Learning | D4RL Medium Walker2d | Normalized Score77.6 | 58 | |
| Offline Reinforcement Learning | D4RL Medium-Replay Walker2d | Normalized Score81.5 | 34 | |
| Offline Reinforcement Learning | D4RL Medium Hopper | Normalized Score98.4 | 26 | |
| Offline Reinforcement Learning | D4RL Kitchen-mixed v0 (test) | Normalized Score66.9 | 18 |