Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Hindsight Experience Replay

About

Dealing with sparse rewards is one of the biggest challenges in Reinforcement Learning (RL). We present a novel technique called Hindsight Experience Replay which allows sample-efficient learning from rewards which are sparse and binary and therefore avoid the need for complicated reward engineering. It can be combined with an arbitrary off-policy RL algorithm and may be seen as a form of implicit curriculum. We demonstrate our approach on the task of manipulating objects with a robotic arm. In particular, we run experiments on three different tasks: pushing, sliding, and pick-and-place, in each case using only binary rewards indicating whether or not the task is completed. Our ablation studies show that Hindsight Experience Replay is a crucial ingredient which makes training possible in these challenging environments. We show that our policies trained on a physics simulation can be deployed on a physical robot and successfully complete the task.

Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba• 2017

Related benchmarks

TaskDatasetResultRank
Continual Reinforcement LearningMeta-World MT50 v2
AP71.2
11
Offline Goal-Conditioned Reinforcement LearningFetchReach (offline)
Discounted Return29.8
10
Robotic Block ManipulationHandManipulateBlockFull v0
Success Rate4
10
Robotic Egg ManipulationHandManipulateEggFull v0
Success Rate22
10
Robotic Hand ReachingHandReach v0
Success Rate54
10
Robotic PushingFetchPush v1
Success Rate99
10
Visual PickupSkewFit
Goal Reaching Error (m)0.035
10
Offline Goal-Conditioned Reinforcement LearningFetchPick (offline)
Discounted Return16.8
10
Robotic Pen RotationHandManipulatePenRotate v0
Success Rate18
10
Robotic Pick-and-PlaceFetchPickAndPlace v1
Success Rate88
10
Showing 10 of 30 rows

Other info

Follow for update