Stop Wandering, Find the Keys: LLMs Discriminate Key States for Efficient Multi-Agent Exploration

About

With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.

Yun Qu, Boyuan Wang, Yuhang Jiang, Jianzhun Shao, Yixiu Mao, Heming Zou, Chang Liu, Cheems Wang, Meiqin Liu, Xiangyang Ji• 2024

Related benchmarks

Task	Dataset	Result
Exploration	MPE Pass	Exploration Steps (k)153.1	2
Exploration	MPE Large-Pass	Exploration Steps (thousands)446.9	2
Exploration	MPE Push-Box	Exploration Steps (k)159	2
Exploration	MPE Secret-Room	Exploration Steps (k)3.17e+5	2

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord