Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MAGE: Safeguarding LLM Agents against Long-Horizon Threats via Shadow Memory

About

As large language model (LLM)-powered agents are increasingly deployed to perform complex, real-world tasks, they face a growing class of attacks that exploit extended user-agent-environment interactions to pursue malicious objectives improbable in single-turn settings. Such long-horizon threats pose significant risks to the safe deployment of LLM agents in critical domains. In this paper, we present MAGE (Memory As Guardrail Enforcement), a novel defensive framework designed to counter a wide range of long-horizon threats. Inspired by the "shadow stack" abstraction in systems security, MAGE maintains a dedicated, safety-focused agentic memory that distills and retains safety-critical context across the agent's full execution trajectory, leveraging this shadow memory to proactively assess the risk of pending actions prior to their execution. Extensive evaluation demonstrates that MAGE substantially outperforms existing defenses across diverse long-horizon threats in detection accuracy, achieves early-stage detection for the majority of attacks, and introduces only negligible overhead to agent utility. To our best knowledge, MAGE represents the first framework to detect and mitigate long-horizon threats using an agentic memory approach, establishing a new paradigm for this critical challenge and opening promising directions for future research.

Yuhui Wang, Tanqiu Jiang, Jiacheng Liang, Charles Fleming, Ting Wang• 2026

Related benchmarks

TaskDatasetResultRank
Agent SafetyR-Judge
Accuracy92.56
92
Trajectory-level safety evaluationR-judge (test)
Accuracy91.95
32
Safeguarding LLM Agents against prompt injectionBanking and Slack (test)
BU (No Attack)86.5
21
Safety ClassificationASSEBench
Accuracy86.47
20
Safety ClassificationPre-Ex-Bench
Accuracy90.87
20
Safety ClassificationASSEBench (test)
Accuracy88.86
12
Safety ClassificationPre-Ex-Bench (test)
Accuracy80.08
12
Showing 7 of 7 rows

Other info

Follow for update