Towards Comprehensive Testing on the Robustness of Cooperative Multi-agent Reinforcement Learning
About
While deep neural networks (DNNs) have strengthened the performance of cooperative multi-agent reinforcement learning (c-MARL), the agent policy can be easily perturbed by adversarial examples. Considering the safety critical applications of c-MARL, such as traffic management, power management and unmanned aerial vehicle control, it is crucial to test the robustness of c-MARL algorithm before it was deployed in reality. Existing adversarial attacks for MARL could be used for testing, but is limited to one robustness aspects (e.g., reward, state, action), while c-MARL model could be attacked from any aspect. To overcome the challenge, we propose MARLSafe, the first robustness testing framework for c-MARL algorithms. First, motivated by Markov Decision Process (MDP), MARLSafe consider the robustness of c-MARL algorithms comprehensively from three aspects, namely state robustness, action robustness and reward robustness. Any c-MARL algorithm must simultaneously satisfy these robustness aspects to be considered secure. Second, due to the scarceness of c-MARL attack, we propose c-MARL attacks as robustness testing algorithms from multiple aspects. Experiments on \textit{SMAC} environment reveals that many state-of-the-art c-MARL algorithms are of low robustness in all aspect, pointing out the urgent need to test and enhance robustness of c-MARL algorithms.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Adversarial Attack | Multi-Agent Particle Environment reference | Reward-34.95 | 12 | |
| Adversarial Attack | MPE spread | Reward Score-1.01e+3 | 12 | |
| Adversarial Attack | Google Research Football counterattack | Reward0.78 | 12 | |
| Adversarial Attack | SMAC 1c3s5z | Reward10.79 | 12 | |
| Adversarial Attack | SMAC 8m | Reward11.63 | 12 | |
| Adversarial Attack | Google Research Football 3 vs 1 | Reward1.19 | 12 | |
| Adversarial Attack | SMAC bane_vs_bane | Reward13.05 | 12 | |
| Adversarial Attack | SMAC 27m_vs_30m | Reward13.36 | 12 | |
| Attack Detection | SMAC 1c3s5z | F1 Score82 | 5 | |
| Attack Detection | SMAC 8m | F1 Score86 | 5 |