StruQ: Defending Against Prompt Injection with Structured Queries

About

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language understanding capabilities. However, as LLMs have improved, so have the attacks against them. Prompt injection attacks are an important threat: they trick the model into deviating from the original application's instructions and instead follow user directives. These attacks rely on the LLM's ability to follow instructions and inability to separate prompts and user data. We introduce structured queries, a general approach to tackle this problem. Structured queries separate prompts and data into two channels. We implement a system that supports structured queries. This system is made of (1) a secure front-end that formats a prompt and user data into a special format, and (2) a specially trained LLM that can produce high-quality outputs from these inputs. The LLM is trained using a novel fine-tuning strategy: we convert a base (non-instruction-tuned) LLM to a structured instruction-tuned model that will only follow instructions in the prompt portion of a query. To do so, we augment standard instruction tuning datasets with examples that also include instructions in the data portion of the query, and fine-tune the model to ignore these. Our system significantly improves resistance to prompt injection attacks, with little or no impact on utility. Our code is released at https://github.com/Sizhe-Chen/StruQ.

Sizhe Chen, Julien Piet, Chawin Sitawarin, David Wagner• 2024

Related benchmarks

Task	Dataset	Result
Prompt Injection Defense	Inj-SQuAD	Combined ASR0.11	123
Prompt Injection Prevention	Alpaca-Farm	ASR0.96	105
Question Answering	TriviaQA	Accuracy76.22	41
Prompt Injection Attack	Direct Scenario	ASR2.88	28
Agent Task Performance	AgentDojo Travel	Attack Success Rate7.14	24
Prompt Injection Prevention	NQ simplified	Naïve Success Rate3	24
Indirect Prompt Injection Defense	Inj-TriviaQA	Naive ASR0.11	21
Prompt Injection Defense	Indirect Prompt Injection Middle 1.0	Naive ASR0.11	18
Prompt Injection Defense	Indirect Prompt Injection Tail 1.0	ASR Naive0.11	18
Agent Task Performance	AgentDojo Banking	Attack Success Rate61.81	18

Showing 10 of 45 rows

Other info

Follow for update

@wizwand_team Discord