Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Energy-Based Hindsight Experience Prioritization

About

In Hindsight Experience Replay (HER), a reinforcement learning agent is trained by treating whatever it has achieved as virtual goals. However, in previous work, the experience was replayed at random, without considering which episode might be the most valuable for learning. In this paper, we develop an energy-based framework for prioritizing hindsight experience in robotic manipulation tasks. Our approach is inspired by the work-energy principle in physics. We define a trajectory energy function as the sum of the transition energy of the target object over the trajectory. We hypothesize that replaying episodes that have high trajectory energy is more effective for reinforcement learning in robotics. To verify our hypothesis, we designed a framework for hindsight experience prioritization based on the trajectory energy of goal states. The trajectory energy function takes the potential, kinetic, and rotational energy into consideration. We evaluate our Energy-Based Prioritization (EBP) approach on four challenging robotic manipulation tasks in simulation. Our empirical results show that our proposed method surpasses state-of-the-art approaches in terms of both performance and sample-efficiency on all four tasks, without increasing computational time. A video showing experimental results is available at https://youtu.be/jtsF2tTeUGQ

Rui Zhao, Volker Tresp• 2018

Related benchmarks

TaskDatasetResultRank
Robotic Pen RotationHandManipulatePenRotate v0
Success Rate24
10
Robotic Pick-and-PlaceFetchPickAndPlace v1
Success Rate94
10
Robotic Hand ReachingHandReach v0
Success Rate42
10
Robotic PushingFetchPush v1
Success Rate99
10
Robotic Egg ManipulationHandManipulateEggFull v0
Success Rate7
10
Robotic Block ManipulationHandManipulateBlockFull v0
Success Rate0.00e+0
10
Robotic ManipulationFetchPush v1
Time-to-Threshold (Epochs)18
5
Robotic ManipulationHandReach v0
Cumulative Regret75.2
5
Robotic ManipulationFetchPickAndPlace v1
Time to Threshold (Epochs)85
5
Robotic ManipulationHandManipulateEggFull v0
Cumulative Regret (R)92.1
5
Showing 10 of 12 rows

Other info

Follow for update