IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents

About

Large language model (LLM) agents are widely deployed in real-world applications, where they leverage tools to retrieve and manipulate external data for complex tasks. However, when interacting with untrusted data sources (e.g., fetching information from public websites), tool responses may contain injected instructions that covertly influence agent behaviors and lead to malicious outcomes, a threat referred to as Indirect Prompt Injection (IPI). Existing defenses typically rely on advanced prompting strategies or auxiliary detection models. While these methods have demonstrated some effectiveness, they fundamentally rely on assumptions about the model's inherent security, which lacks structural constraints on agent behaviors. As a result, agents still retain unrestricted access to tool invocations, leaving them vulnerable to stronger attack vectors that can bypass the security guardrails of the model. To prevent malicious tool invocations at the source, we propose a novel defensive task execution paradigm, called IPIGuard, which models the agents' task execution process as a traversal over a planned Tool Dependency Graph (TDG). By explicitly decoupling action planning from interaction with external data, IPIGuard significantly reduces unintended tool invocations triggered by injected instructions, thereby enhancing robustness against IPI attacks. Experiments on the AgentDojo benchmark show that IPIGuard achieves a superior balance between effectiveness and robustness, paving the way for the development of safer agentic systems in dynamic environments.

Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, Shouling Ji• 2025

Related benchmarks

Task	Dataset	Result
Indirect Prompt Injection Defense Evaluation	AgentDojo TOOLKNOWLEDGE attack suite	Latency (s)51.51	24
Adversarial Robustness against Indirect Prompt Injection	AgentDojo IgnorePrevious	Utility (UA)73.92	22
Adversarial Robustness against Indirect Prompt Injection	AgentDojo Combined	UA73.58	22
Adversarial Robustness against Indirect Prompt Injection	AgentDojo ImportantMsgs	Utility (UA)59.3	22
Adversarial Robustness against Indirect Prompt Injection	AgentDojo ToolKnowledge	Utility Score59.64	22
LLM Agent Task Completion	AgentDojo No Attack	Benign Utility73.91	22
Adversarial Robustness against Indirect Prompt Injection	AgentDojo Average across attacks	UA52.58	22
Safeguarding LLM Agents against prompt injection	Banking and Slack (test)	BU (No Attack)78.4	21
Secure LLM Agent Task Completion	AgentDojo	Benign Utility64.95	9
Data Leakage Prevention	AgentLeak n=496	Any Leak Prevention Rate78.63	5

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord