LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injection

About

AI agents such as OpenClaw are increasingly deployed in local workflows with access to external tools. This creates indirect prompt-injection (IPI) risk: an agent may execute harmful instructions embedded in untrusted inputs such as email, downloaded files, webpages, repositories, or group-chat messages. Existing evaluations are often small, purely simulated, or focused on a narrow set of channels. We introduce LivePI (Live Prompt Injection), a structured benchmark for IPI risk in a production-like but test-controlled environment. LivePI covers seven input surfaces, twelve attack/rendering families, and five malicious goals, including protected-information exfiltration, unauthorized security-control changes, unsafe code retrieval or execution, inbox-summary exfiltration, and cryptocurrency transfer. We run LivePI on a real virtual machine with live but test-controlled email, chat, web, local-file, repository, and wallet interfaces. Across GPT-5.3-Codex, Claude Opus 4.6, Gemini 3.1 Pro, Kimi K2.5, and GLM-5, total attack success rates range from 10.7% to 29.6%. Group-chat injection is uniformly successful across the evaluated backbones in our deployment, and repository-link attacks produce high-severity failures despite a small denominator. We also evaluate a two-layer defense consisting of prompt-level filtering and pre-execution tool-call authorization. In the GPT-5.3-Codex setting, the defense intercepts all tested malicious-goal completions in LivePI before execution while preserving benign utility on PinchBench-derived workloads.

Lei Zhao, Abhay Bhaskar, Edgar Dobriban• 2026

Related benchmarks

Task	Dataset	Result
Indirect Prompt Injection	LivePI Group chat (n=15)	--	5
Indirect Prompt Injection	LivePI Email (n=50)	--	5
Indirect Prompt Injection	LivePI Local Docs (n=50)	--	5
Indirect Prompt Injection	LivePI Repo Links (n=4)	--	5
Indirect Prompt Injection	LivePI Gist (n=50)	--	5
Indirect Prompt Injection	LivePI Total	--	5

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord