Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems

About

Large Language Model (LLM)-based Multi-agent Systems (MAS) have demonstrated remarkable capabilities in various complex tasks, ranging from collaborative problem-solving to autonomous decision-making. However, as these systems become increasingly integrated into critical applications, their vulnerability to adversarial attacks, misinformation propagation, and unintended behaviors have raised significant concerns. To address this challenge, we introduce G-Safeguard, a topology-guided security lens and treatment for robust LLM-MAS, which leverages graph neural networks to detect anomalies on the multi-agent utterance graph and employ topological intervention for attack remediation. Extensive experiments demonstrate that G-Safeguard: (I) exhibits significant effectiveness under various attack strategies, recovering over 40% of the performance for prompt injection; (II) is highly adaptable to diverse LLM backbones and large-scale MAS; (III) can seamlessly combine with mainstream MAS with security guarantees. The code is available at https://github.com/wslong20/G-safeguard.

Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, Yang Wang• 2025

Related benchmarks

TaskDatasetResultRank
Prompt InjectionMMLU
ASR@317
31
Targeted AttackInjecAgent
ASR@39.21
31
Prompt InjectionCSQA
ASR@318.33
28
Prompt InjectionGSM8K
ASR@36
28
Malicious AgentCSQA
ASR@30.0867
28
Malicious AgentPoisonRAG
ASR@36
28
Group Collusive Attack DetectionGSM8K
Detection Accuracy91.46
27
Group Collusive Attack DetectionMMLU
Detection Accuracy87.2
27
Group Collusive Attack DetectionMultiArith
Detection Accuracy89.2
27
Group Collusive Attack DetectionHumanEval
Detection Accuracy91.2
27
Showing 10 of 22 rows

Other info

Code

Follow for update