Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning to Cooperate with Humans using Generative Agents

About

Training agents that can coordinate zero-shot with humans is a key mission in multi-agent reinforcement learning (MARL). Current algorithms focus on training simulated human partner policies which are then used to train a Cooperator agent. The simulated human is produced either through behavior cloning over a dataset of human cooperation behavior, or by using MARL to create a population of simulated agents. However, these approaches often struggle to produce a Cooperator that can coordinate well with real humans, since the simulated humans fail to cover the diverse strategies and styles employed by people in the real world. We show \emph{learning a generative model of human partners} can effectively address this issue. Our model learns a latent variable representation of the human that can be regarded as encoding the human's unique strategy, intention, experience, or style. This generative model can be flexibly trained from any (human or neural policy) agent interaction data. By sampling from the latent space, we can use the generative model to produce different partners to train Cooperator agents. We evaluate our method -- \textbf{G}enerative \textbf{A}gent \textbf{M}odeling for \textbf{M}ulti-agent \textbf{A}daptation (GAMMA) -- on Overcooked, a challenging cooperative cooking game that has become a standard benchmark for zero-shot coordination. We conduct an evaluation with real human teammates, and the results show that GAMMA consistently improves performance, whether the generative model is trained on simulated populations or human datasets. Further, we propose a method for posterior sampling from the generative model that is biased towards the human data, enabling us to efficiently improve performance with only a small amount of expensive human interaction data.

Yancheng Liang, Daphne Chen, Abhishek Gupta, Simon S. Du, Natasha Jaques• 2024

Related benchmarks

TaskDatasetResultRank
CoordinationOvercooked Cramped Room layout v1
SP181
14
Human-Agent CoordinationOvercooked Counter Circuit (human evaluation)
Average Score91.11
9
Human-Agent CoordinationOvercooked Multi-strategy Counter (human evaluation)
Average Score93.09
9
Zero-shot CoordinationOvercooked Asymmetric Advantages layout v1
SP240
7
CoordinationOvercooked Coordination Ring layout v1
SP143
7
CoordinationOvercooked Forced Coordination layout v1
SP108
7
Zero-shot CoordinationOvercooked Coordination Ring layout v1
SP143
7
Zero-shot CoordinationOvercooked Forced Coordination layout v1
SP108
7
CoordinationOvercooked Counter Circuit layout v1
SP66
7
Zero-shot CoordinationOvercooked Counter Circuit layout v1
SP Score66
7
Showing 10 of 13 rows

Other info

Code

Follow for update