Learning to Cooperate with Humans using Generative Agents

About

Training agents that can coordinate zero-shot with humans is a key mission in multi-agent reinforcement learning (MARL). Current algorithms focus on training simulated human partner policies which are then used to train a Cooperator agent. The simulated human is produced either through behavior cloning over a dataset of human cooperation behavior, or by using MARL to create a population of simulated agents. However, these approaches often struggle to produce a Cooperator that can coordinate well with real humans, since the simulated humans fail to cover the diverse strategies and styles employed by people in the real world. We show \emph{learning a generative model of human partners} can effectively address this issue. Our model learns a latent variable representation of the human that can be regarded as encoding the human's unique strategy, intention, experience, or style. This generative model can be flexibly trained from any (human or neural policy) agent interaction data. By sampling from the latent space, we can use the generative model to produce different partners to train Cooperator agents. We evaluate our method -- \textbf{G}enerative \textbf{A}gent \textbf{M}odeling for \textbf{M}ulti-agent \textbf{A}daptation (GAMMA) -- on Overcooked, a challenging cooperative cooking game that has become a standard benchmark for zero-shot coordination. We conduct an evaluation with real human teammates, and the results show that GAMMA consistently improves performance, whether the generative model is trained on simulated populations or human datasets. Further, we propose a method for posterior sampling from the generative model that is biased towards the human data, enabling us to efficiently improve performance with only a small amount of expensive human interaction data.

Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon S. Du, Natasha Jaques• 2024

Related benchmarks

Task	Dataset	Result
Coordination	Overcooked Cramped Room layout v1	SP181	14
Human-Agent Coordination	Overcooked Counter Circuit (human evaluation)	Average Score91.11	9
Human-Agent Coordination	Overcooked Multi-strategy Counter (human evaluation)	Average Score93.09	9
Zero-shot Coordination	Overcooked Asymmetric Advantages layout v1	SP240	7
Coordination	Overcooked Coordination Ring layout v1	SP143	7
Coordination	Overcooked Forced Coordination layout v1	SP108	7
Zero-shot Coordination	Overcooked Coordination Ring layout v1	SP143	7
Zero-shot Coordination	Overcooked Forced Coordination layout v1	SP108	7
Coordination	Overcooked Counter Circuit layout v1	SP66	7
Zero-shot Coordination	Overcooked Counter Circuit layout v1	SP Score66	7

Showing 10 of 13 rows

Other info

Code

Follow for update

@wizwand_team Discord