Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs

About

Large Language Models (LLMs) have been integrated into many applications (e.g., web agents) to perform more sophisticated tasks. However, LLM-empowered applications are vulnerable to Indirect Prompt Injection (IPI) attacks, where instructions are injected via untrustworthy external data sources. This paper presents Rennervate, a defense framework to detect and prevent IPI attacks. Rennervate leverages attention features to detect the covert injection at a fine-grained token level, enabling precise sanitization that neutralizes IPI attacks while maintaining LLM functionalities. Specifically, the token-level detector is materialized with a 2-step attentive pooling mechanism, which aggregates attention heads and response tokens for IPI detection and sanitization. Moreover, we establish a fine-grained IPI dataset, FIPI, to be open-sourced to support further research. Extensive experiments verify that Rennervate outperforms 15 commercial and academic IPI defense methods, achieving high precision on 5 LLMs and 6 datasets. We also demonstrate that Rennervate is transferable to unseen attacks and robust against adaptive adversaries.

Yinan Zhong, Qianhao Miao, Yanjiao Chen, Jiangyi Deng, Yushi Cheng, Wenyuan Xu• 2025

Related benchmarks

TaskDatasetResultRank
IPI DetectionFIPI (test)
Accuracy99.58
42
IPI SanitizationJfleg RTE (unseen)
ASR0.00e+0
20
Indirect Prompt Injection SanitizationIPI Sanitization Esc.
ASR0.00e+0
15
Indirect Prompt Injection SanitizationIPI Sanitization Ig
ASR0.00e+0
15
Indirect Prompt Injection SanitizationIPI Sanitization Total
Attack Success Rate (ASR)0.2
15
Indirect Prompt Injection SanitizationIPI Sanitization Naive
ASR0.93
15
Indirect Prompt Injection SanitizationIPI Sanitization Cp.
ASR0.00e+0
15
Indirect Prompt Injection SanitizationIPI Sanitization Cb.
ASR0.00e+0
15
Indirect Prompt Injection DetectionJfleg RTE
Accuracy99.4
10
IPI SanitizationMRPC-SST2 (unseen)
ASR23.9
10
Showing 10 of 44 rows

Other info

Follow for update