Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

About

A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong• 2023

Related benchmarks

TaskDatasetResultRank
Knowledge Graph Question AnsweringCWQ--
212
Indirect Prompt Injection AttackIPI-3k
ASR37.4
90
Indirect Prompt InjectionAmazon Reviews
ASR16.4
47
Indirect Prompt InjectionHotpotQA
ASR84.4
42
Indirect Prompt InjectionMulti-News
ASR87.5
42
Prompt InjectionOpenPromptInjection
ASVh73.2
40
Knowledge Graph Question AnsweringWebQSP
Accuracy (ACC)67
28
Adversarial AttackNQ
ASR48
24
Instruction Injection Attack on Web Browser AgentGitLab Short
UUA100
16
Instruction Injection Attack on Web Browser AgentGitLab Long
UUA83.33
16
Showing 10 of 49 rows

Other info

Follow for update