Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Progent: Programmable Privilege Control for LLM Agents

About

LLM agents utilize Large Language Models as central components with diverse tools to complete various user tasks, but face significant security risks when interacting with external environments. Attackers can exploit these agents through various vectors, including indirect prompt injection, memory/knowledge base poisoning, and malicious tools, tricking agents into performing dangerous actions such as unauthorized financial transactions or data leakage. The core problem that enables attacks to succeed lies in over-privileged tool access. We introduce Progent, the first privilege control framework to secure LLM agents. Progent enforces security at the tool level by restricting agents to performing tool calls necessary for user tasks while blocking potentially malicious ones. Progent features a domain-specific language that allows for expressing fine-grained policies for controlling tool privileges, flexible fallback actions when calls are blocked, and dynamic policy updates to adapt to changing agent states. The framework operates deterministically at runtime, providing provable security guarantees. Thanks to our modular design, integrating Progent does not alter agent internals and only requires minimal changes to the existing agent implementation, enhancing its practicality and potential for widespread adoption. Our extensive evaluation across various agent use cases, using benchmarks like AgentDojo, ASB, and AgentPoison, demonstrates that Progent reduces attack success rates to 0%, while preserving agent utility and speed. Additionally, we show that LLMs can automatically generate effective policies, highlighting their potential for automating the process of writing Progent's security policies.

Tianneng Shi, Jingxuan He, Zhun Wang, Hongwei Li, Linyu Wu, Wenbo Guo, Dawn Song• 2025

Related benchmarks

TaskDatasetResultRank
Indirect Prompt InjectionAgentDojo
Benign Utility63.42
12
Accidental DisclosureCFH-Hard Accidental
Accuracy (CFH-Hard Accidental)89
8
Attack Success RateAgentDojo Slack environment
IA Success Rate0.00e+0
8
Coding CFH (reverse shell) attackCoding CFH Original
Generation Success Rate80
8
Computer Use Control-Flow HijackingCFH-Hard Computer Use
Gen. Rate67
8
Indirect Prompt Injection Attackpayloads Original
Attack Success Rate (IA)10
8
Indirect Prompt Injection AttackCFH Hard Coding
Attack Success Rate (IA)7
8
Indirect Prompt Injection AttackCFH-Hard Computer Use
Attack Success Rate (IA)69
8
Coding CFH (reverse shell) attackCFH Hard Coding
Generation Success Rate80
8
LLM agent prompt injection defense evaluationAgentDojo without attack
Total Tokens (M)2.6
8
Showing 10 of 15 rows

Other info

Follow for update