Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-supervised network distillation: an effective approach to exploration in sparse reward environments

About

Reinforcement learning can solve decision-making problems and train an agent to behave in an environment according to a predesigned reward function. However, such an approach becomes very problematic if the reward is too sparse and so the agent does not come across the reward during the environmental exploration. The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration during which the agent is likely to also encounter external reward. Novelty detection is one of the promising branches of intrinsic motivation research. We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator, where the predictor model and the target model are both trained. We adapted three existing self-supervised methods for this purpose and experimentally tested them on a set of ten environments that are considered difficult to explore. The results show that our approach achieves faster growth and higher external reward for the same training time compared to the baseline models, which implies improved exploration in a very sparse reward environment. In addition, the analytical methods we applied provide valuable explanatory insights into our proposed models.

Matej Pech\'a\v{c}, Michal Chovanec, Igor Farka\v{s}• 2023

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAtari 2600 Montezuma's Revenge ALE (test)
Score1.50e+4
24
Reinforcement LearningAtari 2600 Gravitar ALE (test)
Score6.71e+3
19
Reinforcement LearningAtari 2600 Private Eye ALE (test)
Score1.73e+4
19
Reinforcement LearningAtari Venture ALE (test)
Average Maximal Score2.19e+3
5
Reinforcement LearningAtari Solaris ALE (test)
Average Maximal Score1.25e+4
5
Reinforcement LearningGravitar (test)
Avg Cumulative External Reward10.05
5
Reinforcement LearningVenture (test)
Avg Cumulative External Reward11.36
5
Reinforcement LearningPrivate Eye (test)
Avg Cumulative External Reward per Episode6.44
5
Reinforcement LearningSolaris (test)
Avg Cumulative Reward11.61
5
Reinforcement LearningCaveflyer (test)
Avg Cumulative Reward11.14
5
Showing 10 of 15 rows

Other info

Code

Follow for update