Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding
About
Multi-Agent Reinforcement Learning (MARL) based Multi-Agent Path Finding (MAPF) has recently gained attention due to its efficiency and scalability. Several MARL-MAPF methods choose to use communication to enrich the information one agent can perceive. However, existing works still struggle in structured environments with high obstacle density and a high number of agents. To further improve the performance of the communication-based MARL-MAPF solvers, we propose a new method, Ensembling Prioritized Hybrid Policies (EPH). We first propose a selective communication block to gather richer information for better agent coordination within multi-agent environments and train the model with a Q learning-based algorithm. We further introduce three advanced inference strategies aimed at bolstering performance during the execution phase. First, we hybridize the neural policy with single-agent expert guidance for navigating conflict-free zones. Secondly, we propose Q value-based methods for prioritized resolution of conflicts as well as deadlock situations. Finally, we introduce a robust ensemble method that can efficiently collect the best out of multiple possible solutions. We empirically evaluate EPH in complex multi-agent environments and demonstrate competitive performance against state-of-the-art neural methods for MAPF. We open-source our code at https://github.com/ai4co/eph-mapf.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-Agent Path Finding (MAPF) | random 32x32-20 | Success Rate100 | 77 | |
| Multi-Agent Path Finding (MAPF) | random 64x64-20 | Success Rate98 | 73 | |
| Multi-Agent Path Finding (MAPF) | den312d 65x81 | Success Rate100 | 32 | |
| Multi-Agent Path Finding (MAPF) | warehouse 161x63 | Success Rate100 | 31 | |
| Multi-Agent Path Finding | Random Map 120x120, 0.3 density | Success Rate93 | 15 | |
| Multi-Agent Path Finding | Random Map 240x240, 0.3 density | Success Rate (SR)94 | 15 |