Sub-policy Adaptation for Hierarchical Reinforcement Learning

About

Hierarchical reinforcement learning is a promising approach to tackle long-horizon decision-making problems with sparse rewards. Unfortunately, most methods still decouple the lower-level skill acquisition process and the training of a higher level that controls the skills in a new task. Leaving the skills fixed can lead to significant sub-optimality in the transfer setting. In this work, we propose a novel algorithm to discover a set of skills, and continuously adapt them along with the higher level even when training on a new task. Our main contributions are two-fold. First, we derive a new hierarchical policy gradient with an unbiased latent-dependent baseline, and we introduce Hierarchical Proximal Policy Optimization (HiPPO), an on-policy method to efficiently train all levels of the hierarchy jointly. Second, we propose a method for training time-abstractions that improves the robustness of the obtained skills to environment changes. Code and results are available at sites.google.com/view/hippo-rl

Alexander C. Li, Carlos Florensa, Ignasi Clavera, Pieter Abbeel• 2019

Related benchmarks

Task	Dataset	Result
Hopper Hop	DeepMind Control suite	Average Return102	11
Walker Run	DeepMind Control suite	Average Return472	11
Cheetah Run	DeepMind Control suite	Average Return458	8
fetch_pick_place	Gymnasium Robotics	Cumulative Episodic Reward100	4
fetch_push	Gymnasium Robotics	Cumulative Reward100	4
pendulum_swingup	DeepMind Control Suite (DMC)	Cumulative Episodic Reward817	4
cartpole_swingup	DeepMind Control Suite (DMC)	Cumulative Reward852	4
Long-horizon sparse reward navigation	AntMaze Medium	Cumulative Episodic Rewards0.00e+0	4
Long-horizon sparse reward navigation	AntMaze Large	Cumulative Episodic Rewards0.00e+0	4
quadruped_run	DeepMind Control Suite (DMC)	Cumulative Episodic Reward572	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord