Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly

About

LLM-based agents are increasingly used for cybersecurity tasks, but most existing systems rely on fixed, human-designed scaffolds that struggle to adapt across diverse targets and failure modes. We introduce \textsc{CyberEvolver}, a self-evolving cybersecurity agent framework that iteratively revises its own scaffold based on experience from failed execution attempts. Self-evolution in cybersecurity is challenging because the space of possible scaffold changes is largely unstructured, execution feedback is sparse and often obscured by the environment, and low-diversity updates can cause errors to compound over repeated iterations. \textsc{CyberEvolver} addresses these challenges with a four-layer evolvable agent architecture that decomposes scaffold optimization into structured components, a trace-to-diagnosis mechanism that converts noisy execution logs into actionable revision signals, and a population-based beam search strategy that preserves diverse agent variants during evolution. We evaluate \textsc{CyberEvolver} on CTF challenges, vulnerability exploitation, and penetration-testing tasks using four open-source LLMs. Across these settings, \textsc{CyberEvolver} improves the seed agent's success rate by $13.6$\,\% on average, and outperforms six human-designed cybersecurity agents as well as two self-improvement methods adapted from other domains. These results suggest that scaffold self-evolution is a promising direction for building adaptive LLM agents for security testing.

Yihe Fan, Changyi Li, Lichen Xu, Xudong Pan, Jiarun Dai, Hong Geng, Min Yang• 2026

Related benchmarks

TaskDatasetResultRank
Cybersecurity vulnerability remediationCVE-Bench (one-day)--
5
CTF challengesNYU-CTF--
4
Penetration TestingAutoPenBench--
4
Vulnerability ExploitationCVEBench Zero-Day--
4
Vulnerability ExploitationCVEBench One-Day--
4
Cybersecurity vulnerability remediationCVE-Bench--
2
Cybersecurity vulnerability remediationCVE-Bench (zero-day)--
1
Cybersecurity vulnerability remediationCVE-Bench 1.0--
1
Showing 8 of 8 rows

Other info

Follow for update