Unveiling Privacy Risks in LLM Agent Memory

About

Large Language Model (LLM) agents have become increasingly prevalent across various real-world applications. They enhance decision-making by storing private user-agent interactions in the memory module for demonstrations, introducing new privacy risks for LLM agents. In this work, we systematically investigate the vulnerability of LLM agents to our proposed Memory EXTRaction Attack (MEXTRA) under a black-box setting. To extract private information from memory, we propose an effective attacking prompt design and an automated prompt generation method based on different levels of knowledge about the LLM agent. Experiments on two representative agents demonstrate the effectiveness of MEXTRA. Moreover, we explore key factors influencing memory leakage from both the agent designer's and the attacker's perspectives. Our findings highlight the urgent need for effective memory safeguards in LLM agent design and deployment.

Bo Wang, Weiyi He, Shenglai Zeng, Zhen Xiang, Yue Xing, Jiliang Tang, Pengfei He• 2025

Related benchmarks

Task	Dataset	Result
Data Extraction Attack	RAP	Attack Success Rate (ASR)63	32
Data Extraction Attack	EHRAgent	Equality (EQ)55	20
Data Extraction Attack	ReAct	EQ41	20
PII Extraction	Private data (test)	Pass@119.64	15
Data Extraction Attack on Agent Memory	EhrAgent (test)	Equality (EQ)55	12
Data Extraction Attack on Agent Memory	ReAct (test)	EQ Score40	12
Data Extraction Attack on Agent Memory	RAP (test)	Equality Score35	12
Adversarial Attack Detection	Privacy Extraction Attack Dataset	Positive Rate93.56	3

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord