Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Securing AI Agents with Information-Flow Control

About

As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach enables us to complete a broad range of tasks with security guarantees. A tutorial to walk readers through the the concepts introduced in the paper can be found at https://github.com/microsoft/fides

Manuel Costa, Boris K\"opf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-B\'eguelin• 2025

Related benchmarks

TaskDatasetResultRank
Agentic Security EvaluationAgentDojo v1 (97 benign tasks, 27 injection tasks)
Utility Score59.8
20
Agent PlanningAgentDojo
TCR @ ∞78.9
16
Agent Safety EvaluationAgent-SafetyBench--
8
SND DefenseSND Evaluation Corpus
Benign FPR0.00e+0
6
Indirect Prompt Injection DefenseInjecAgent external stress (test)
DH Block Rate100
4
Safety EvaluationCross-domain generalization set (5 domains)
Utility Score35.7
4
Propagation DetectionInjecAgent base
Precision100
2
Propagation DetectionInjecAgent enhanced
Precision1
2
Propagation DetectionToolEmu filtered 79-case injection-like
Precision62.3
2
Persistent Memory Attack BlockingCross-session persistent memory attack dataset 110 entry-model pairs 1.0 (test)
Total Samples Labeled (S2)0.00e+0
2
Showing 10 of 11 rows

Other info

Follow for update