Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Generalized Per-Agent Advantage Estimation for Multi-Agent Policy Optimization

About

In this paper, we propose a novel framework for multi-agent reinforcement learning that enhances sample efficiency and coordination through accurate per-agent advantage estimation. The core of our approach is Generalized Per-Agent Advantage Estimator (GPAE), which employs a per-agent value iteration operator to compute precise per-agent advantages. This operator enables stable off-policy learning by indirectly estimating values via action probabilities, eliminating the need for direct Q-function estimation. To further refine estimation, we introduce a double-truncated importance sampling ratio scheme. This scheme improves credit assignment for off-policy trajectories by balancing sensitivity to the agent's own policy changes with robustness to non-stationarity from other agents. Experiments on benchmarks demonstrate that our approach outperforms existing approaches, excelling in coordination and sample efficiency for complex scenarios.

Seongmin Kim, Giseung Park, Woojun Kim, Jiwon Jeon, Seungyul Han, Youngchul Sung• 2026

Related benchmarks

TaskDatasetResultRank
5m vs 6mSMAC
Win Rate93.7
13
6h vs 8zSMAC
Win Rate99.8
12
3s5z vs 3s6zSMAC
Win Rate87.3
12
10m vs 11mSMAC
Win Rate98.5
12
smacv2_10_unitsSMAX SMACv2
Average Win Rate75
7
smacv2_5_unitsSMAX SMAC v2
Average Win Rate81
7
ant-4x2MABrax
Episode Return3.57e+3
5
ant-8x1MABrax
Episode Return3.29e+3
5
halfcheetah-6x1MABrax
Episode Return3.46e+3
5
hopper-3x1MABrax
Episode Return1.57e+3
5
Showing 10 of 12 rows

Other info

Follow for update