Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Partial Action Replacement in Offline MARL

About

Offline multi-agent reinforcement learning (MARL) faces a critical challenge: the joint action space grows exponentially with the number of agents, making dataset coverage exponentially sparse and out-of-distribution (OOD) joint actions unavoidable. Partial Action Replacement (PAR) mitigates this by anchoring a subset of agents to dataset actions, but existing approach relies on enumerating multiple subset configurations at high computational cost and cannot adapt to varying states. We introduce PLCQL, a framework that formulates PAR subset selection as a contextual bandit problem and learns a state-dependent PAR policy using Proximal Policy Optimisation with an uncertainty-weighted reward. This adaptive policy dynamically determines how many agents to replace at each update step, balancing policy improvement against conservative value estimation. We prove a value-error bound showing that the estimation error scales linearly with the expected number of deviating agents. Compared with the previous PAR-based method SPaCQL, PLCQL reduces the number of per-iteration Q-function evaluations from n to 1, significantly improving computational efficiency. Empirically, PLCQL achieves the highest normalised scores on 66% of tasks across MPE, MaMuJoCo, and SMAC benchmarks, outperforming SPaCQL on 84% of tasks while substantially reducing computational cost.

Yue Jin, Giovanni Montana• 2026

Related benchmarks

TaskDatasetResultRank
StarCraft II micromanagementStarCraft II 2s3z medium
Win Rate51
18
StarCraft II micromanagementStarCraft II 2s3z medium_replay
Win Rate76
18
StarCraft II micromanagementStarCraft II 2s3z expert
Win Rate100
18
StarCraft II micromanagementStarCraft II 2s3z mixed
Win Rate97
18
StarCraft II micromanagementStarCraft II 3s_vs_5z medium
Win Rate37
18
StarCraft II micromanagementStarCraft II 5m_vs_6m medium
Win Rate46
18
StarCraft II micromanagementStarCraft II 5m_vs_6m medium_replay
Win Rate24
18
StarCraft II micromanagementStarCraft II 5m_vs_6m mixed
Win Rate82
18
StarCraft II micromanagementStarCraft II 6h_vs_8z medium
Test Winning Rate48
18
StarCraft II micromanagementStarCraft II 6h_vs_8z (mixed)
Win Rate56
18
Showing 10 of 32 rows

Other info

Follow for update