BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks

About

The security of LLM-based multi-agent systems (MAS) is critically threatened by propagation vulnerability, where malicious agents can distort collective decision-making through inter-agent message interactions. While existing supervised defense methods demonstrate promising performance, they may be impractical in real-world scenarios due to their heavy reliance on labeled malicious agents to train a supervised malicious detection model. To enable practical and generalizable MAS defenses, in this paper, we propose BlindGuard, an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors. To this end, we establish a hierarchical agent encoder to capture individual, neighborhood, and global interaction patterns of each agent, providing a comprehensive understanding for malicious agent detection. Meanwhile, we design a corruption-guided detector that consists of directional noise injection and contrastive learning, allowing effective detection model training solely on normal agent behaviors. Extensive experiments show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across MAS with various communication patterns while maintaining superior generalizability compared to supervised baselines. The code is available at: https://github.com/MR9812/BlindGuard.

Rui Miao, Yixin Liu, Yili Wang, Xu Shen, Yue Tan, Yiwei Dai, Shirui Pan, Xin Wang• 2025

Related benchmarks

Task	Dataset	Result
Prompt Injection	MMLU	ASR@318	91
Targeted Attack	InjecAgent	ASR@317.69	55
Prompt Injection	GSM8K	ASR@36.44	52
Prompt Injection	CSQA	ASR@322	52
Malicious Agent	PoisonRAG	ASR@314.67	52
Commonsense Reasoning	CSQA	Task Success Rate (TSR)69	30
Logical Inference	LogiQA	Task Success Rate (TSR)57.75	30
General Knowledge Question Answering	MMLU	Task Success Rate (TSR)78.25	30
Malicious Agent	CSQA	ASR@30.2033	28
Memory Attack	CSQA	ASR@37	24

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord