Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Task Agnostic Skills with Data-driven Guidance

About

To increase autonomy in reinforcement learning, agents need to learn useful behaviours without reliance on manually designed reward functions. To that end, skill discovery methods have been used to learn the intrinsic options available to an agent using task-agnostic objectives. However, without the guidance of task-specific rewards, emergent behaviours are generally useless due to the under-constrained problem of skill discovery in complex and high-dimensional spaces. This paper proposes a framework for guiding the skill discovery towards the subset of expert-visited states using a learned state projection. We apply our method in various reinforcement learning (RL) tasks and show that such a projection results in more useful behaviours.

Even Klemsdal, Sverre Herland, Abdulmajid Murad• 2021

Related benchmarks

TaskDatasetResultRank
Downstream Task PerformanceAnt North
Average Performance-2.12e+3
7
Safe LocomotionHalfCheetah Not-Flip
Safe State Ratio100
7
Downstream Task PerformanceAnt Range
Average Performance-717.9
7
Safe LocomotionHumanoid Hole
Safe State Ratio100
7
Safe LocomotionSafety-Gym Hazard
Safe State Ratio33.5
7
Safe LocomotionHalfCheetah Right
Safe State Ratio52.7
7
Safe LocomotionAnt Range-North
Safe State Ratio40.2
7
Safe LocomotionAnt North
Safe State Ratio20.1
7
Safe LocomotionAnt Range
Safe State Ratio28.1
7
Safe LocomotionAnt Hole-North
Safe State Ratio76.9
7
Showing 10 of 14 rows

Other info

Follow for update