Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Prioritized Level Replay

About

Environments with procedurally generated content serve as important benchmarks for testing systematic generalization in deep reinforcement learning. In this setting, each level is an algorithmically created environment instance with a unique configuration of its factors of variation. Training on a prespecified subset of levels allows for testing generalization to unseen levels. What can be learned from a level depends on the current policy, yet prior work defaults to uniform sampling of training levels independently of the policy. We introduce Prioritized Level Replay (PLR), a general framework for selectively sampling the next training level by prioritizing those with higher estimated learning potential when revisited in the future. We show TD-errors effectively estimate a level's future learning potential and, when used to guide the sampling procedure, induce an emergent curriculum of increasingly difficult levels. By adapting the sampling of training levels, PLR significantly improves sample efficiency and generalization on Procgen Benchmark--matching the previous state-of-the-art in test return--and readily combines with other methods. Combined with the previous leading method, PLR raises the state-of-the-art to over 76% improvement in test return relative to standard RL baselines.

Minqi Jiang, Edward Grefenstette, Tim Rockt\"aschel• 2020

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningProcgen (test)
BigFish Return10.9
21
NavigationMiniWorld FourRooms
Success Rate64
15
Partially observable navigationMinigrid SimpleCrossing
Solved Rate88
6
Partially observable navigationMinigrid SmallCorridor
Solved Rate97
6
2D bipedal locomotionBasic (OpenAI Gym) (test)
Average Return306
6
2D bipedal locomotionHardcore (OpenAI Gym) (test)
Average Return116.6
6
2D bipedal locomotionStairs (test)
Average Return58.4
6
2D bipedal locomotionPitGap (test)
Average Return54.2
6
2D bipedal locomotionStump (test)
Average Return9.2
6
2D bipedal locomotionRoughness (test)
Average Return144.5
6
Showing 10 of 20 rows

Other info

Follow for update