Teams of LLM Agents can Exploit Zero-Day Vulnerabilities

About

LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior agents struggle with exploring many different vulnerabilities and long-range planning when used alone. To resolve this, we introduce HPTSA, a system of agents with a planning agent that can launch subagents. The planning agent explores the system and determines which subagents to call, resolving long-term planning issues when trying different vulnerabilities. We construct a benchmark of 14 real-world vulnerabilities and show that our team of agents improve over prior agent frameworks by up to 4.3X.

Yuxuan Zhu, Antony Kellermann, Akul Gupta, Philip Li, Richard Fang, Rohan Bindu, Daniel Kang• 2024

Related benchmarks

Task	Dataset	Result
Automated Vulnerability Exploitation	CVE-Bench zero-day vulnerabilities (test)	Success@17.5	5
Vulnerability Exploitation	CVEBench Zero-Day	--	4
Vulnerability Exploitation	CVEBench One-Day	--	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord