Formalizing and Benchmarking Prompt Injection Attacks and Defenses

About

A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong• 2023

Related benchmarks

Task	Dataset	Result
Knowledge Graph Question Answering	CWQ	--	212
Indirect Prompt Injection Attack	IPI-3k	ASR37.4	90
Indirect Prompt Injection	Amazon Reviews	ASR16.4	47
Indirect Prompt Injection	HotpotQA	ASR84.4	42
Indirect Prompt Injection	Multi-News	ASR87.5	42
Prompt Injection	OpenPromptInjection	ASVh73.2	40
Knowledge Graph Question Answering	WebQSP	Accuracy (ACC)67	28
Adversarial Attack	NQ	ASR48	24
Instruction Injection Attack on Web Browser Agent	GitLab Short	UUA100	16
Instruction Injection Attack on Web Browser Agent	GitLab Long	UUA83.33	16

Showing 10 of 49 rows

Other info

Follow for update

@wizwand_team Discord