G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems
About
Large Language Model (LLM)-based Multi-agent Systems (MAS) have demonstrated remarkable capabilities in various complex tasks, ranging from collaborative problem-solving to autonomous decision-making. However, as these systems become increasingly integrated into critical applications, their vulnerability to adversarial attacks, misinformation propagation, and unintended behaviors have raised significant concerns. To address this challenge, we introduce G-Safeguard, a topology-guided security lens and treatment for robust LLM-MAS, which leverages graph neural networks to detect anomalies on the multi-agent utterance graph and employ topological intervention for attack remediation. Extensive experiments demonstrate that G-Safeguard: (I) exhibits significant effectiveness under various attack strategies, recovering over 40% of the performance for prompt injection; (II) is highly adaptable to diverse LLM backbones and large-scale MAS; (III) can seamlessly combine with mainstream MAS with security guarantees. The code is available at https://github.com/wslong20/G-safeguard.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Prompt Injection | MMLU | ASR@317 | 31 | |
| Targeted Attack | InjecAgent | ASR@39.21 | 31 | |
| Prompt Injection | CSQA | ASR@318.33 | 28 | |
| Prompt Injection | GSM8K | ASR@36 | 28 | |
| Malicious Agent | CSQA | ASR@30.0867 | 28 | |
| Malicious Agent | PoisonRAG | ASR@36 | 28 | |
| Malicious Advice Defense | PoisonRAG | ASR@313.3 | 18 | |
| Prompt Injection | MMLU random topology | ASR (k=1)16.4 | 16 | |
| Prompt Injection Defense | CSQA | ASR@326.3 | 16 | |
| Prompt Injection Defense | GSM8K PI (Prompt Injection) (test) | ASR@13.7 | 16 |