Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents

About

Personalized LLM agents maintain persistent cross-session state to support long-horizon collaboration. Yet, this persistence introduces a subtle but critical security vulnerability: routine user-agent interactions can gradually reshape an agent's long-term state, inadvertently weakening future confirmation boundaries, expanding tool-use defaults, and escalating autonomous behavior over time. We formalize this risk as \textbf{unintended long-term state poisoning}. To systematically study it, we introduce the \textbf{Unintended Long-Term State Poisoning Bench (ULSPB)}, a bilingual benchmark comprising $350$ settings spanning five assistance categories, seven interaction patterns, 24-turn routine interactions, and matched single-injection counterparts. Furthermore, we define the \emph{Harm Score} (HS), a state-centric metric that quantifies \emph{authorization drift}, \emph{tool-use escalation}, and \emph{unchecked autonomy}. Experiments on OpenClaw with four backbone LLMs demonstrate that, while single-injection is generally effective, routine conversations alone can substantially poison long-term state, primarily corrupting memory-centric artifacts. Evaluations seeded with real-world user interactions confirm that this risk is not a mere artifact of synthetic prompts. To mitigate this threat, we propose \textbf{StateGuard}, a lightweight, post-execution defense that audits state diffs at the writeback boundary and selectively rolls back dangerous edits. Across all evaluated models, StateGuard reduces HS to near zero and lowers false-negative rates, with acceptable high false-positive rates under a safety-first writeback defense and minimal overhead.

Xiaoyu Xu, Minxin Du, Qipeng Xie, Haobin Ke, Qingqing Ye, Haibo Hu• 2026

Related benchmarks

TaskDatasetResultRank
Safety GuardrailingULSPB 350 interaction runs
HS Rate0.02
24
Long-term state poisoning evaluationOpenClaw Injection Tool--
8
Long-term state poisoning evaluationOpenClaw Routine--
4
Long-term state poisoning evaluationOpenClaw Log Replay--
4
Long-term state poisoning evaluationOpenClaw Web Content--
4
Long-term state poisoning evaluationOpenClaw Average across conversation variants--
4
Long-term state poisoning evaluationOpenClaw EN--
4
Long-term state poisoning evaluationOpenClaw ZH--
4
Showing 8 of 8 rows

Other info

Follow for update