"Other-Play" for Zero-Shot Coordination

About

We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e.g. humans). Standard Multi-Agent Reinforcement Learning (MARL) methods typically focus on the self-play (SP) setting where agents construct strategies by playing the game with themselves repeatedly. Unfortunately, applying SP naively to the zero-shot coordination problem can produce agents that establish highly specialized conventions that do not carry over to novel partners they have not been trained with. We introduce a novel learning algorithm called other-play (OP), that enhances self-play by looking for more robust strategies, exploiting the presence of known symmetries in the underlying problem. We characterize OP theoretically as well as experimentally. We study the cooperative card game Hanabi and show that OP agents achieve higher scores when paired with independently trained agents. In preliminary results we also show that our OP agents obtains higher average scores when paired with human players, compared to state-of-the-art SP agents.

Hengyuan Hu, Adam Lerer, Alex Peysakhovich, Jakob Foerster• 2020

Related benchmarks

Task	Dataset	Result
Cooperative Play	Hanabi Self-play	Score24.14	5
Multi-agent coordination	Foraging Experiment 3	IQM Return6.573	5
Multi-agent coordination	Foraging Experiment 2	IQM Return6.59	5
Multi-agent coordination	Overcooked	IQM Return33.05	5
Ad-hoc Coordination	Hanabi w/ Color Bot	Game Score14.8	5
Ad-hoc Coordination	Hanabi w/ Clone Bot	Score13.03	5
Cooperative Play	Hanabi Cross-Play	Score21.77	5
Multi-agent coordination	Foraging Experiment 1	IQM Return5.998	5
Multi-agent coordination	Predator Prey	IQM Return2.077	5
Ad-hoc Coordination	Hanabi w/ Rank Bot	Score12.36	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord