When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents

About

Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term state, inadvertently weakening future confirmation boundaries, expanding tool-use defaults, and escalating autonomous behavior over time. We formalize this risk as \textbf{unintended long-term state poisoning}. To systematically study it, we introduce the \textbf{Unintended Long-Term State Poisoning Bench (ULSPB)}, a bilingual benchmark comprising $350$ settings spanning five assistance categories, seven interaction patterns, 24-turn routine interactions, and matched single-injection counterparts. Furthermore, we define the \emph{Harm Score} (HS), a state-centric metric that quantifies \emph{authorization drift}, \emph{tool-use escalation}, and \emph{unchecked autonomy}. Experiments on OpenClaw with four backbone LLMs demonstrate that, while single-injection is generally effective, routine conversations alone can substantially poison long-term state, primarily corrupting memory-centric artifacts. Evaluations seeded with real-world user interactions confirm that this risk is not a mere artifact of synthetic prompts. To mitigate this threat, we propose \textbf{StateGuard}, a lightweight, post-execution defense that audits state diffs at the writeback boundary and selectively rolls back dangerous edits. Across all evaluated models, StateGuard reduces HS to near zero and lowers false-negative rates, with acceptable high false-positive rates under a safety-first writeback defense and minimal overhead.

Xiaoyu Xu, Minxin Du, Qipeng Xie, Haobin Ke, Qingqing Ye, Haibo Hu• 2026

Related benchmarks

Task	Dataset	Result
Safety Guardrailing	ULSPB 350 interaction runs	HS Rate0.02	24
Long-term state poisoning evaluation	OpenClaw Injection Tool	--	8
Long-term state poisoning evaluation	OpenClaw Routine	--	4
Long-term state poisoning evaluation	OpenClaw Log Replay	--	4
Long-term state poisoning evaluation	OpenClaw Web Content	--	4
Long-term state poisoning evaluation	OpenClaw Average across conversation variants	--	4
Long-term state poisoning evaluation	OpenClaw EN	--	4
Long-term state poisoning evaluation	OpenClaw ZH	--	4

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord