Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

A Hierarchical Approach to Population Training for Human-AI Collaboration

About

A major challenge for deep reinforcement learning (DRL) agents is to collaborate with novel partners that were not encountered by them during the training phase. This is specifically worsened by an increased variance in action responses when the DRL agents collaborate with human partners due to the lack of consistency in human behaviors. Recent work have shown that training a single agent as the best response to a diverse population of training partners significantly increases an agent's robustness to novel partners. We further enhance the population-based training approach by introducing a Hierarchical Reinforcement Learning (HRL) based method for Human-AI Collaboration. Our agent is able to learn multiple best-response policies as its low-level policy while at the same time, it learns a high-level policy that acts as a manager which allows the agent to dynamically switch between the low-level best-response policies based on its current partner. We demonstrate that our method is able to dynamically adapt to novel partners of different play styles and skill levels in the 2-player collaborative Overcooked game environment. We also conducted a human study in the same environment to test the effectiveness of our method when partnering with real human subjects.

Yi Loo, Chen Gong, Malika Meghjani• 2023

Related benchmarks

TaskDatasetResultRank
Cooperative Multi-Agent CoordinationOvercooked-AI Coordination Ring
Total Mean Reward96
4
Collaborative CookingOvercooked Asym. Adv.
Total Mean Reward66.25
4
Collaborative CookingOvercooked Coord. Ring
Mean Reward77.5
4
Collaborative CookingOvercooked Counter Circuit
Total Mean Reward35
4
Collaborative CookingOvercooked Forced Coord.
Total Mean Reward25.2
4
Cooperative CookingOvercooked-AI Cramped Room
Total Mean Reward93.13
4
Cooperative Multi-Agent CoordinationOvercooked-AI Cramped Room
Total Mean Reward117.9
4
Cooperative Multi-Agent CoordinationOvercooked-AI Asymmetric Advantages
Mean Reward86.2
4
Cooperative Multi-Agent CoordinationOvercooked-AI Counter Circuit
Total Mean Reward38.1
4
Cooperative Multi-Agent CoordinationOvercooked-AI Forced Coordination
Total Mean Reward35.6
4
Showing 10 of 21 rows

Other info

Follow for update