Self-Motivated Multi-Agent Exploration

About

In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration. However, agents can hardly accomplish the team task without coordination and they would be trapped in a local optimum where easy cooperation is accessed without enough individual exploration. Recent works mainly concentrate on agents' coordinated exploration, which brings about the exponentially grown exploration of the state space. To address this issue, we propose Self-Motivated Multi-Agent Exploration (SMMAE), which aims to achieve success in team tasks by adaptively finding a trade-off between self-exploration and team cooperation. In SMMAE, we train an independent exploration policy for each agent to maximize their own visited state space. Each agent learns an adjustable exploration probability based on the stability of the joint team policy. The experiments on highly cooperative tasks in StarCraft II micromanagement benchmark (SMAC) demonstrate that SMMAE can explore task-related states more efficiently, accomplish coordinated behaviours and boost the learning performance.

Shaowei Zhang, Jiahan Cao, Lei Yuan, Yang Yu, De-Chuan Zhan• 2023

Related benchmarks

Task	Dataset	Result
Multi-Agent Reinforcement Learning	MAMuJoCo Walker2d 6x1 (test)	Average Episodic Return12.16	13
Multi-Agent Reinforcement Learning	Level-Based Foraging 10x10-4p-3f v2 (test)	Final Episode Return12	10
Multi-Agent Reinforcement Learning	Level-Based Foraging 10x10-3p-5f v2 (test)	Final Episode Return7	10
Multi-Agent Reinforcement Learning	Level-Based Foraging 2s-8x8-2p-2f-coop v2 (test)	Final Episode Return2	10
Multi-Agent Reinforcement Learning	Level-Based Foraging 2s-10x10-3p-3f v2 (test)	Final Episode Return10	10
Multi-Agent Reinforcement Learning	MAMuJoCo Ant 8x1 (test)	Average Episodic Return22.62	8
Multi-Agent Reinforcement Learning	MAMuJoCo Hopper 3x1 (test)	Average Episodic Return12.52	8
Multi-Agent Reinforcement Learning	MAMuJoCo HalfCheetah 6x1 (test)	Average Episodic Return-3.28	8

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord