Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay

About

Hindsight experience replay (HER) is a goal relabelling technique typically used with off-policy deep reinforcement learning algorithms to solve goal-oriented tasks; it is well suited to robotic manipulation tasks that deliver only sparse rewards. In HER, both trajectories and transitions are sampled uniformly for training. However, not all of the agent's experiences contribute equally to training, and so naive uniform sampling may lead to inefficient learning. In this paper, we propose diversity-based trajectory and goal selection with HER (DTGSH). Firstly, trajectories are sampled according to the diversity of the goal states as modelled by determinantal point processes (DPPs). Secondly, transitions with diverse goal states are selected from the trajectories by using k-DPPs. We evaluate DTGSH on five challenging robotic manipulation tasks in simulated robot environments, where we show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks.

Tianhong Dai, Hengyan Liu, Kai Arulkumaran, Guangyu Ren, Anil Anthony Bharath• 2021

Related benchmarks

Task	Dataset	Result
Robotic Pushing	FetchPush v1	Success Rate100	10
Robotic Hand Reaching	HandReach v0	Success Rate62	10
Robotic Block Manipulation	HandManipulateBlockFull v0	Success Rate7	10
Robotic Egg Manipulation	HandManipulateEggFull v0	Success Rate29	10
Robotic Pen Rotation	HandManipulatePenRotate v0	Success Rate25	10
Robotic Pick-and-Place	FetchPickAndPlace v1	Success Rate93	10
Robotic Manipulation	FetchPush v1	Time-to-Threshold (Epochs)14	5
Robotic Manipulation	HandReach v0	Cumulative Regret60.5	5
Robotic Manipulation	HandManipulatePenRotate v0	Time to Threshold (Epochs)22	5
Robotic Manipulation	FetchPickAndPlace v1	Time to Threshold (Epochs)40	5

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord