Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Model-Based Diffusion for Trajectory Optimization

About

Recent advances in diffusion models have demonstrated their strong capabilities in generating high-fidelity samples from complex distributions through an iterative refinement process. Despite the empirical success of diffusion models in motion planning and control, the model-free nature of these methods does not leverage readily available model information and limits their generalization to new scenarios beyond the training data (e.g., new robots with different dynamics). In this work, we introduce Model-Based Diffusion (MBD), an optimization approach using the diffusion process to solve trajectory optimization (TO) problems without data. The key idea is to explicitly compute the score function by leveraging the model information in TO problems, which is why we refer to our approach as model-based diffusion. Moreover, although MBD does not require external data, it can be naturally integrated with data of diverse qualities to steer the diffusion process. We also reveal that MBD has interesting connections to sampling-based optimization. Empirical evaluations show that MBD outperforms state-of-the-art reinforcement learning and sampling-based TO methods in challenging contact-rich tasks. Additionally, MBD's ability to integrate with data enhances its versatility and practical applicability, even with imperfect and infeasible data (e.g., partial-state demonstrations for high-dimensional humanoids), beyond the scope of standard diffusion models.

Chaoyi Pan, Zeji Yi, Guanya Shi, Guannan Qu• 2024

Related benchmarks

TaskDatasetResultRank
Open-Loop Trajectory Planning100 Randomized Trials Quadrotor Waypoint Navigation (test)
Success Rate68
5
Trajectory OptimizationWalker2D
Computational Time (s)34.6
5
Trajectory OptimizationHopper
Computational Time (s)26.5
5
Trajectory OptimizationHalf Cheetah
Computational Time (s)26.8
5
Trajectory OptimizationAnt
Computational Time (s)16.2
5
Trajectory OptimizationHumanoid Standup
Computational Time (s)17.7
5
Trajectory OptimizationHumanoid Running
Computational Time (s)30
5
Trajectory OptimizationPush T
Time (s)1.03e+3
5
Trajectory PlanningBicycle 3D
Reward5.95
4
Trajectory PlanningNTrailer 5D
Reward5.48
4
Showing 10 of 15 rows

Other info

Code

Follow for update