Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Self-Motivated Multi-Agent Exploration

About

In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration. However, agents can hardly accomplish the team task without coordination and they would be trapped in a local optimum where easy cooperation is accessed without enough individual exploration. Recent works mainly concentrate on agents' coordinated exploration, which brings about the exponentially grown exploration of the state space. To address this issue, we propose Self-Motivated Multi-Agent Exploration (SMMAE), which aims to achieve success in team tasks by adaptively finding a trade-off between self-exploration and team cooperation. In SMMAE, we train an independent exploration policy for each agent to maximize their own visited state space. Each agent learns an adjustable exploration probability based on the stability of the joint team policy. The experiments on highly cooperative tasks in StarCraft II micromanagement benchmark (SMAC) demonstrate that SMMAE can explore task-related states more efficiently, accomplish coordinated behaviours and boost the learning performance.

Shaowei Zhang, Jiahan Cao, Lei Yuan, Yang Yu, De-Chuan Zhan• 2023

Related benchmarks

TaskDatasetResultRank
Multi-Agent Reinforcement LearningLevel-Based Foraging 10x10-4p-3f v2 (test)
Final Episode Return12
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 10x10-3p-5f v2 (test)
Final Episode Return7
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 2s-8x8-2p-2f-coop v2 (test)
Final Episode Return2
10
Multi-Agent Reinforcement LearningLevel-Based Foraging 2s-10x10-3p-3f v2 (test)
Final Episode Return10
10
Multi-Agent Reinforcement LearningMAMuJoCo Ant 8x1 (test)
Average Episodic Return22.62
8
Multi-Agent Reinforcement LearningMAMuJoCo Hopper 3x1 (test)
Average Episodic Return12.52
8
Multi-Agent Reinforcement LearningMAMuJoCo Walker2d 6x1 (test)
Average Episodic Return12.16
8
Multi-Agent Reinforcement LearningMAMuJoCo HalfCheetah 6x1 (test)
Average Episodic Return-3.28
8
Showing 8 of 8 rows

Other info

Follow for update