AgentSafe: Safeguarding Large Language Model-based Multi-agent Systems via Hierarchical Data Management

About

Large Language Model based multi-agent systems are revolutionizing autonomous communication and collaboration, yet they remain vulnerable to security threats like unauthorized access and data breaches. To address this, we introduce AgentSafe, a novel framework that enhances MAS security through hierarchical information management and memory protection. AgentSafe classifies information by security levels, restricting sensitive data access to authorized agents. AgentSafe incorporates two components: ThreatSieve, which secures communication by verifying information authority and preventing impersonation, and HierarCache, an adaptive memory management system that defends against unauthorized access and malicious poisoning, representing the first systematic defense for agent memory. Experiments across various LLMs show that AgentSafe significantly boosts system resilience, achieving defense success rates above 80% under adversarial conditions. Additionally, AgentSafe demonstrates scalability, maintaining robust performance as agent numbers and information complexity grow. Results underscore effectiveness of AgentSafe in securing MAS and its potential for real-world application.

Junyuan Mao, Fanci Meng, Yifan Duan, Miao Yu, Xiaojun Jia, Junfeng Fang, Yuxuan Liang, Kun Wang, Qingsong Wen• 2025

Related benchmarks

Task	Dataset	Result
Prompt Injection	MMLU	ASR@339.7	91
Targeted Attack	InjecAgent	ASR@30.3	55
Prompt Injection	CSQA	ASR60	36
Trojan Attack	InjecAgent	ASR25	36
Prompt Injection	MATH	Attack Success Rate (ASR)25	36
Malicious Advice Defense	PoisonRAG	ASR29.7	36
Prompt Injection Defense	GSM8K PI (Prompt Injection) (test)	ASR@13.7	16
Prompt Injection Defense	PI (CSQA) random topology	ASR @144.6	16
Prompt Injection	MMLU random topology	ASR (k=1)24.5	16
Prompt Injection Defense	CSQA	ASR@355.6	16

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord