Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Diffused Task-Agnostic Milestone Planner

About

Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space and to have an agent to follow the milestones to accomplish a given task. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control. Furthermore, our approach exploits generation flexibility of the diffusion model, which makes it possible to plan diverse trajectories for multi-task decision-making. We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark.

Mineui Hong, Minjae Kang, Songhwai Oh• 2023

Related benchmarks

TaskDatasetResultRank
hopper locomotionD4RL hopper medium-replay
Normalized Score100
56
walker2d locomotionD4RL walker2d medium-replay
Normalized Score79.5
53
LocomotionD4RL walker2d-medium-expert
Normalized Score108.2
47
LocomotionD4RL Walker2d medium
Normalized Score82.7
44
LocomotionD4RL Halfcheetah medium
Normalized Score47.3
44
hopper locomotionD4RL Hopper medium
Normalized Score80.7
38
hopper locomotionD4RL hopper-medium-expert
Normalized Score109.4
38
LocomotionD4RL halfcheetah-medium-expert
Normalized Score88.2
37
HalfCheetahD4RL Medium-Replay v0
Normalized Score42.6
28
Offline Reinforcement LearningD4RL AntMaze medium-play v2
Averaged Score89.3
4
Showing 10 of 14 rows

Other info

Follow for update