Partial Action Replacement: Tackling Distribution Shift in Offline MARL
About
Offline multi-agent reinforcement learning (MARL) is severely hampered by the challenge of evaluating out-of-distribution (OOD) joint actions. Our core finding is that when the behavior policy is factorized - a common scenario where agents act fully or partially independently during data collection - a strategy of partial action replacement (PAR) can significantly mitigate this challenge. PAR updates a single or part of agents' actions while the others remain fixed to the behavioral data, reducing distribution shift compared to full joint-action updates. Based on this insight, we develop Soft-Partial Conservative Q-Learning (SPaCQL), using PAR to mitigate OOD issue and dynamically weighting different PAR strategies based on the uncertainty of value estimation. We provide a rigorous theoretical foundation for this approach, proving that under factorized behavior policies, the induced distribution shift scales linearly with the number of deviating agents rather than exponentially with the joint-action space. This yields a provably tighter value error bound for this important class of offline MARL problems. Our theoretical results also indicate that SPaCQL adaptively addresses distribution shift using uncertainty-informed weights. Our empirical results demonstrate SPaCQL enables more effective policy learning, and manifest its remarkable superiority over baseline algorithms when the offline dataset exhibits the independence structure.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| StarCraft II micromanagement | StarCraft II 2s3z medium | Win Rate46 | 18 | |
| StarCraft II micromanagement | StarCraft II 2s3z medium_replay | Win Rate56 | 18 | |
| StarCraft II micromanagement | StarCraft II 2s3z mixed | Win Rate96 | 18 | |
| StarCraft II micromanagement | StarCraft II 5m_vs_6m medium | Win Rate33 | 18 | |
| StarCraft II micromanagement | StarCraft II 5m_vs_6m medium_replay | Win Rate23 | 18 | |
| StarCraft II micromanagement | StarCraft II 6h_vs_8z (mixed) | Win Rate52 | 18 | |
| StarCraft II micromanagement | StarCraft II 3s_vs_5z mixed | Win Rate43 | 18 | |
| StarCraft II micromanagement | StarCraft II 5m_vs_6m mixed | Win Rate78 | 18 | |
| StarCraft II micromanagement | StarCraft II 6h_vs_8z medium | Test Winning Rate42 | 18 | |
| StarCraft II micromanagement | StarCraft II 2s3z expert | Win Rate99 | 18 |