Defeating Prompt Injections by Design
About
Large Language Models (LLMs) are increasingly deployed in agentic systems that interact with an untrusted environment. However, LLM agents are vulnerable to prompt injection attacks when handling untrusted data. In this paper we propose CaMeL, a robust defense that creates a protective system layer around the LLM, securing it even when underlying models are susceptible to attacks. To operate, CaMeL explicitly extracts the control and data flows from the (trusted) query; therefore, the untrusted data retrieved by the LLM can never impact the program flow. To further improve security, CaMeL uses a notion of a capability to prevent the exfiltration of private data over unauthorized data flows by enforcing security policies when tools are called. We demonstrate effectiveness of CaMeL by solving $77\%$ of tasks with provable security (compared to $84\%$ with an undefended system) in AgentDojo. We release CaMeL at https://github.com/google-research/camel-prompt-injection.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Indirect Prompt Injection Defense Evaluation | AgentDojo TOOLKNOWLEDGE attack suite | Latency (s)105.4 | 24 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo ImportantMsgs | Utility (UA)42.52 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo Average across attacks | UA39.26 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo ToolKnowledge | Utility Score42.18 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo IgnorePrevious | Utility (UA)42.97 | 22 | |
| Adversarial Robustness against Indirect Prompt Injection | AgentDojo Combined | UA42.4 | 22 | |
| LLM Agent Task Completion | AgentDojo No Attack | Benign Utility38.04 | 22 | |
| Tool-use agent security evaluation | SIREN | Explicit Directive (UA)23.56 | 16 | |
| Agent defense evaluation | AgentDojo | Utility under Attack54.5 | 12 | |
| Indirect Prompt Injection | AgentDojo | Benign Utility29.97 | 12 |