The Primacy Bias in Deep Reinforcement Learning
About
This work identifies a common flaw of deep reinforcement learning (RL) algorithms: a tendency to rely on early interactions and ignore useful evidence encountered later. Because of training on progressively growing datasets, deep RL agents incur a risk of overfitting to earlier experiences, negatively affecting the rest of the learning process. Inspired by cognitive science, we refer to this effect as the primacy bias. Through a series of experiments, we dissect the algorithmic aspects of deep RL that exacerbate this bias. We then propose a simple yet generally-applicable mechanism that tackles the primacy bias by periodically resetting a part of the agent. We apply this mechanism to algorithms in both discrete (Atari 100k) and continuous action (DeepMind Control Suite) domains, consistently improving their performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reinforcement Learning | Acrobot v1 | Mean Return-264.6 | 14 | |
| Navigation and Procedural Generation RL | MiniGrid | GoToDoor-8x871 | 9 | |
| Reinforcement Learning | MiniGrid v0 (test) | GoToDoor-8x8 Success Rate0.71 | 9 | |
| Reinforcement Learning | DMC Walker-run | Normalized AUC725.4 | 8 | |
| Reinforcement Learning | DMC Hopper-hop | Normalized AUC248.9 | 8 | |
| Reinforcement Learning | DMC Quadruped-run | Normalized AUC687.4 | 8 | |
| Reinforcement Learning | DMC Cheetah-run | Normalized AUC595.1 | 8 | |
| Reinforcement Learning | Hopper v5 (strong-drift) | Final Return16.32 | 5 |