CyberEvolver: Structured Self-Evolution for Cybersecurity Agents On the Fly
About
LLM-based agents are increasingly used for cybersecurity tasks, but most existing systems rely on fixed, human-designed scaffolds that struggle to adapt across diverse targets and failure modes. We introduce \textsc{CyberEvolver}, a self-evolving cybersecurity agent framework that iteratively revises its own scaffold based on experience from failed execution attempts. Self-evolution in cybersecurity is challenging because the space of possible scaffold changes is largely unstructured, execution feedback is sparse and often obscured by the environment, and low-diversity updates can cause errors to compound over repeated iterations. \textsc{CyberEvolver} addresses these challenges with a four-layer evolvable agent architecture that decomposes scaffold optimization into structured components, a trace-to-diagnosis mechanism that converts noisy execution logs into actionable revision signals, and a population-based beam search strategy that preserves diverse agent variants during evolution. We evaluate \textsc{CyberEvolver} on CTF challenges, vulnerability exploitation, and penetration-testing tasks using four open-source LLMs. Across these settings, \textsc{CyberEvolver} improves the seed agent's success rate by $13.6$\,\% on average, and outperforms six human-designed cybersecurity agents as well as two self-improvement methods adapted from other domains. These results suggest that scaffold self-evolution is a promising direction for building adaptive LLM agents for security testing.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cybersecurity vulnerability remediation | CVE-Bench (one-day) | -- | 5 | |
| CTF challenges | NYU-CTF | -- | 4 | |
| Penetration Testing | AutoPenBench | -- | 4 | |
| Vulnerability Exploitation | CVEBench Zero-Day | -- | 4 | |
| Vulnerability Exploitation | CVEBench One-Day | -- | 4 | |
| Cybersecurity vulnerability remediation | CVE-Bench | -- | 2 | |
| Cybersecurity vulnerability remediation | CVE-Bench (zero-day) | -- | 1 | |
| Cybersecurity vulnerability remediation | CVE-Bench 1.0 | -- | 1 |