Diffused Task-Agnostic Milestone Planner
About
Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space and to have an agent to follow the milestones to accomplish a given task. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control. Furthermore, our approach exploits generation flexibility of the diffusion model, which makes it possible to plan diverse trajectories for multi-task decision-making. We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| hopper locomotion | D4RL hopper medium-replay | Normalized Score100 | 56 | |
| walker2d locomotion | D4RL walker2d medium-replay | Normalized Score79.5 | 53 | |
| Locomotion | D4RL walker2d-medium-expert | Normalized Score108.2 | 47 | |
| Locomotion | D4RL Walker2d medium | Normalized Score82.7 | 44 | |
| Locomotion | D4RL Halfcheetah medium | Normalized Score47.3 | 44 | |
| hopper locomotion | D4RL Hopper medium | Normalized Score80.7 | 38 | |
| hopper locomotion | D4RL hopper-medium-expert | Normalized Score109.4 | 38 | |
| Locomotion | D4RL halfcheetah-medium-expert | Normalized Score88.2 | 37 | |
| HalfCheetah | D4RL Medium-Replay v0 | Normalized Score42.6 | 28 | |
| Offline Reinforcement Learning | D4RL AntMaze medium-play v2 | Averaged Score89.3 | 4 |