Effective Diversity in Population Based Reinforcement Learning

About

Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire in the environment. With that in mind, maintaining a population of agents is an attractive method, as it allows data be collected with a diverse set of behaviors. This behavioral diversity is often boosted via multi-objective loss functions. However, those approaches typically leverage mean field updates based on pairwise distances, which makes them susceptible to cycling behaviors and increased redundancy. In addition, explicitly boosting diversity often has a detrimental impact on optimizing already fruitful behaviors for rewards. As such, the reward-diversity trade off typically relies on heuristics. Finally, such methods require behavioral representations, often handcrafted and domain specific. In this paper, we introduce an approach to optimize all members of a population simultaneously. Rather than using pairwise distance, we measure the volume of the entire population in a behavioral manifold, defined by task-agnostic behavioral embeddings. In addition, our algorithm Diversity via Determinants (DvD), adapts the degree of diversity during training using online learning techniques. We introduce both evolutionary and gradient-based instantiations of DvD and show they effectively improve exploration without reducing performance when better exploration is not required.

Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts• 2020

Related benchmarks

Task	Dataset	Result
Robot Locomotion	Humanoid	Cumulative Reward4.50e+3	16
Multi-Agent Reinforcement Learning	SMAC 2m1z	State Entropy0.03	12
Strategy Discovery	GRF 3v1	Distinct Strategies3	11
Multi-Agent Reinforcement Learning	SMAC 2c64zg	Win Rate100	7
State Entropy Estimation	GRF 3v1	State Entropy0.01	7
Multi-Agent Reinforcement Learning	GRF 3v1 hard	Win Rate83	7
Multi-Agent Reinforcement Learning	SMAC 2c_vs_64zg	State Entropy0.057	6
Adaptability Evaluation	BipedalWalker friction variations (test)	AUC1.20e+3	6
Adaptability Evaluation	BipedalWalker mass variations (test)	AUC113.7	6
Quality Diversity	Swimmer	GT QD Score0.22	6

Showing 10 of 15 rows

Other info

Follow for update

@wizwand_team Discord