Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Formalizing and Benchmarking Prompt Injection Attacks and Defenses

About

A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-Integrated Application such that it produces results as an attacker desires. Existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a framework to formalize prompt injection attacks. Existing attacks are special cases in our framework. Moreover, based on our framework, we design a new attack by combining existing ones. Using our framework, we conduct a systematic evaluation on 5 prompt injection attacks and 10 defenses with 10 LLMs and 7 tasks. Our work provides a common benchmark for quantitatively evaluating future prompt injection attacks and defenses. To facilitate research on this topic, we make our platform public at https://github.com/liu00222/Open-Prompt-Injection.

Yupei Liu, Yuqi Jia, Runpeng Geng, Jinyuan Jia, Neil Zhenqiang Gong• 2023

Related benchmarks

TaskDatasetResultRank
Indirect Prompt Injection AttackIPI-3k
ASR37.4
90
Indirect Prompt InjectionAmazon Reviews
ASR16.4
47
Indirect Prompt InjectionHotpotQA
ASR84.4
42
Indirect Prompt InjectionMulti-News
ASR87.5
42
Toxic Comment DetectionToxic Comment
ASR18.1
14
Negative Review DetectionNegative Review
ASR11
14
Spam Email DetectionSpam Email
ASR37.9
14
Prompt Injection AttackNavGPT (test)
Navigation Error7.51
12
Negative Review ClassificationNegative Review
Tokens Used12.1
10
Prompt InjectionNegative Review
ASR (None Defense)0.3
10
Showing 10 of 15 rows

Other info

Follow for update