PropGuard: Safeguarding LLM-MAS via Propagation-Aware Exploration and Remediation

About

LLM-based multi-agent systems (LLM-MAS) have become a promising paradigm for solving complex tasks through role specialization, tool use, memory, and collaborative reasoning. However, these interactions create new security risks that malicious instructions injected through messages, tools, or memories can propagate across agents and rounds, causing system-level compromise. Existing defenses largely rely on local filtering or graph-based anomaly detection, but they often fail to trace fine-grained propagation paths or remediate contaminated states without disrupting benign collaboration. We propose PropGuard, a propagation-aware framework for safeguarding LLM-MAS. PropGuard constructs a dual-view spatio-temporal graph that combines response-centric risk estimation with full-state evidence preservation. Guided by these risk priors, a GE-GRPO trained inspector sequentially explores the full-state graph to recover compact suspicious propagation subgraphs. PropGuard then verifies harmful propagation through subgraph-aware diagnosis and applies source-guided remediation to correct upstream contamination and replay affected downstream interactions. Experiments across four communication architectures and five attack settings demonstrate that PropGuard consistently lowers attack success while maintaining high task-level defense success, achieving a favorable effectiveness--efficiency trade-off.

Bingyu Yan, Xiaoming Zhang, Jinyu Hou, Chaozhuo Li, Ziyi Zhou, Xiaozhe Zhang, Litian Zhang• 2026

Related benchmarks

Task	Dataset	Result
Prompt Injection	MMLU	ASR@312	91
Malicious Advice Defense	PoisonRAG	ASR5	36
Prompt Injection	MATH	Attack Success Rate (ASR)6	36
Prompt Injection	CSQA	ASR21	36
Trojan Attack	InjecAgent	ASR9	36
Prompt Injection	MMLU random topology	--	16
Memory Attack Defense	PoisonRAG random architecture	ASR7.7	6
Prompt Injection Defense	CSQA random architecture	ASR18.3	6
Prompt Injection Defense	MATH random architecture	ASR6	6
Tool Attack Defense	InjecAgent random architecture	ASR7.3	6

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord