Diffusion Model Predictive Control
About
We propose Diffusion Model Predictive Control (D-MPC), a novel MPC approach that learns a multi-step action proposal and a multi-step dynamics model, both using diffusion models, and combines them for use in online MPC. On the popular D4RL benchmark, we show performance that is significantly better than existing model-based offline planning methods using MPC (e.g. MBOP) and competitive with state-of-the-art (SOTA) model-based and model-free reinforcement learning methods. We additionally illustrate D-MPC's ability to optimize novel reward functions at run time and adapt to novel dynamics, and highlight its advantages compared to existing diffusion-based planning baselines.
Guangyao Zhou, Sivaramakrishnan Swaminathan, Rajkumar Vasudeva Raju, J. Swaroop Guntupalli, Wolfgang Lehrach, Joseph Ortiz, Antoine Dedieu, Miguel L\'azaro-Gredilla, Kevin Murphy• 2024
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL Medium-Replay Hopper | Normalized Score92.5 | 72 | |
| Offline Reinforcement Learning | D4RL Medium HalfCheetah | Normalized Score46 | 59 | |
| Offline Reinforcement Learning | D4RL Medium-Replay HalfCheetah | Normalized Score41.1 | 59 | |
| Offline Reinforcement Learning | D4RL Medium Walker2d | Normalized Score76.2 | 58 | |
| Offline Reinforcement Learning | D4RL Medium-Replay Walker2d | Normalized Score78.8 | 34 | |
| Offline Reinforcement Learning | D4RL Medium Hopper | Normalized Score61.2 | 26 | |
| Offline Reinforcement Learning | D4RL Kitchen-mixed v0 (test) | Normalized Score67.5 | 18 | |
| Offline Reinforcement Learning | D4RL Kitchen kitchen-partial v0 (test) | Normalized Score73.3 | 18 | |
| Offline Reinforcement Learning | puzzle-4x4-play OGBench 5 tasks v0 | Average Success Rate0.00e+0 | 18 | |
| Manipulation | OG-Bench cube-double-play-oraclerep v0 | Success Rate37 | 10 |
Showing 10 of 19 rows