Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems

About

Large Language Model (LLM)-based Multi-agent Systems (MAS) have demonstrated remarkable capabilities in various complex tasks, ranging from collaborative problem-solving to autonomous decision-making. However, as these systems become increasingly integrated into critical applications, their vulnerability to adversarial attacks, misinformation propagation, and unintended behaviors have raised significant concerns. To address this challenge, we introduce G-Safeguard, a topology-guided security lens and treatment for robust LLM-MAS, which leverages graph neural networks to detect anomalies on the multi-agent utterance graph and employ topological intervention for attack remediation. Extensive experiments demonstrate that G-Safeguard: (I) exhibits significant effectiveness under various attack strategies, recovering over 40% of the performance for prompt injection; (II) is highly adaptable to diverse LLM backbones and large-scale MAS; (III) can seamlessly combine with mainstream MAS with security guarantees. The code is available at https://github.com/wslong20/G-safeguard.

Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, Yang Wang• 2025

Related benchmarks

TaskDatasetResultRank
Prompt InjectionMMLU
ASR@317
31
Targeted AttackInjecAgent
ASR@39.21
31
Prompt InjectionCSQA
ASR@318.33
28
Prompt InjectionGSM8K
ASR@36
28
Malicious AgentCSQA
ASR@30.0867
28
Malicious AgentPoisonRAG
ASR@36
28
Malicious Advice DefensePoisonRAG
ASR@313.3
18
Prompt InjectionMMLU random topology
ASR (k=1)16.4
16
Prompt Injection DefenseCSQA
ASR@326.3
16
Prompt Injection DefenseGSM8K PI (Prompt Injection) (test)
ASR@13.7
16
Showing 10 of 13 rows

Other info

Code

Follow for update