Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Heterogeneous-Agent Reinforcement Learning

About

The necessity for cooperation among intelligent machines has popularised cooperative multi-agent reinforcement learning (MARL) in AI research. However, many research endeavours heavily rely on parameter sharing among agents, which confines them to only homogeneous-agent setting and leads to training instability and lack of convergence guarantees. To achieve effective cooperation in the general heterogeneous-agent setting, we propose Heterogeneous-Agent Reinforcement Learning (HARL) algorithms that resolve the aforementioned issues. Central to our findings are the multi-agent advantage decomposition lemma and the sequential update scheme. Based on these, we develop the provably correct Heterogeneous-Agent Trust Region Learning (HATRL), and derive HATRPO and HAPPO by tractable approximations. Furthermore, we discover a novel framework named Heterogeneous-Agent Mirror Learning (HAML), which strengthens theoretical guarantees for HATRPO and HAPPO and provides a general template for cooperative MARL algorithmic designs. We prove that all algorithms derived from HAML inherently enjoy monotonic improvement of joint return and convergence to Nash Equilibrium. As its natural outcome, HAML validates more novel algorithms in addition to HATRPO and HAPPO, including HAA2C, HADDPG, and HATD3, which generally outperform their existing MA-counterparts. We comprehensively test HARL algorithms on six challenging benchmarks and demonstrate their superior effectiveness and stability for coordinating heterogeneous agents compared to strong baselines such as MAPPO and QMIX.

Yifan Zhong, Jakub Grudzien Kuba, Xidong Feng, Siyi Hu, Jiaming Ji, Yaodong Yang• 2023

Related benchmarks

TaskDatasetResultRank
Heterogeneous CoordinationOSP
Success Rate87.9
16
Heterogeneous CoordinationSLH
Success Rate84.4
16
Heterogeneous CoordinationSCT
Success Rate83.4
16
Multi-Agent Reinforcement LearningSMAC 3s5z vs 3s6z
Win Rate46.1
8
Multi-Agent Reinforcement LearningSMAC 5m vs 6m
Win Rate35
8
Multi-Agent Reinforcement LearningSMAC MMM2
Win Rate50.9
8
Collaborative TransportIsaac Lab OSP - S11: Alignment
Success Rate83.6
4
Collaborative TransportIsaac Lab OSP S12 Turnaround
Success Rate77.6
4
Collaborative TransportIsaac Lab OSP - S13: Corner entry
Success Rate (SR)72.7
4
Collaborative TransportIsaac Lab SCT Narrow gate S21
Success Rate80.1
4
Showing 10 of 20 rows

Other info

Follow for update