Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Simple Hierarchical Planning with Diffusion

About

Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. However, they often face computational challenges and can falter in generalization, especially in capturing temporal abstractions for long-horizon tasks. To overcome this, we introduce the Hierarchical Diffuser, a simple, fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning. Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost -- a crucial factor for diffusion-based planning methods, as we have empirically verified. Additionally, the jumpy sub-goals guide our low-level planner, facilitating a fine-tuning stage and further improving our approach's effectiveness. We conducted empirical evaluations on standard offline reinforcement learning benchmarks, demonstrating our method's superior performance and efficiency in terms of training and planning speed compared to the non-hierarchical Diffuser as well as other hierarchical planning methods. Moreover, we explore our model's generalization capability, particularly on how our method improves generalization capabilities on compositional out-of-distribution tasks.

Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, Sungjin Ahn• 2024

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo Ant v4
Average Return2.10e+3
46
Offline Reinforcement LearningD4RL Franka Kitchen
Mixed Success Rate71.7
43
Continuous ControlMuJoCo Walker2d v4--
39
Continuous ControlMuJoCo HalfCheetah v4
Average Return4.01e+3
36
Offline Reinforcement LearningD4RL Maze2D
Return (UMaze)155.8
31
Offline Reinforcement LearningD4RL AntMaze
Medium Diverse Success Rate88.7
27
Continuous ControlMuJoCo Swimmer v4
Total Reward58.4
19
Continuous ControlAnt v4
Average Return2.10e+3
15
Continuous control locomotionMuJoCo HalfCheetah v3 (train)
Final Performance4.01e+3
12
Continuous control locomotionMuJoCo Walker2d v3 (train)
Final Return3.32e+3
12
Showing 10 of 39 rows

Other info

Follow for update