Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PromptArmor: Simple yet Effective Prompt Injection Defenses

About

Despite their potential, recent research has demonstrated that LLM agents are vulnerable to prompt injection attacks, where malicious prompts are injected into the agent's input, causing it to perform an attacker-specified task rather than the intended task provided by the user. In this paper, we present PromptArmor, a simple yet effective defense against prompt injection attacks. Specifically, PromptArmor prompts an off-the-shelf LLM to detect and remove potential injected prompts from the input before the agent processes it. Our results show that PromptArmor can accurately identify and remove injected prompts. For example, using GPT-4o, GPT-4.1, or o4-mini, PromptArmor achieves both a false positive rate and a false negative rate below 1% on the AgentDojo benchmark. Moreover, after removing injected prompts with PromptArmor, the attack success rate drops to below 1%. We also demonstrate PromptArmor's effectiveness against adaptive attacks and explore different strategies for prompting an LLM. We recommend that PromptArmor be adopted as a standard baseline for evaluating new defenses against prompt injection attacks.

Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, Dawn Song• 2025

Related benchmarks

TaskDatasetResultRank
Prompt Injection PreventionAlpaca-Farm--
105
Computer UseOSWorld
OS Success Rate35.2
42
Question AnsweringSQuAD v2
ASR Score1
36
Question AnsweringDolly Closed QA
ASR100
36
Prompt Injection PreventionNQ simplified
Naïve Success Rate28
24
Indirect Prompt Injection Defense EvaluationAgentDojo TOOLKNOWLEDGE attack suite
Latency (s)17.68
24
Prompt Injection DefenseAgentDojo New Attack 2
Utility under Attack (UA)83.77
23
Prompt Injection DefenseAgentDojo Important Instructions
Utility under Attack0.8335
23
Prompt Injection DefenseAgentDojo New Attack 1
Utility under Attack83.67
23
Prompt Injection DefenseAgentDojo No Attack
Benign Utility77.32
23
Showing 10 of 40 rows

Other info

Follow for update