Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis
About
Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While recent research has explored automating tasks such as penetration testing using LLM-based agents, existing frameworks often perform poorly due to limited capability in strategy formulation, domain-specific reasoning, and accurate action and tool selection. To overcome these limitations, we propose Pen-Strategist framework, consisting of a novel domain-specific reasoning model that derives pentesting strategies via logical reasoning and a classifier that converts the strategies into actionable steps. First, we construct a reasoning dataset containing logical explanations for both strategy derivation and step selection in pentesting scenarios. We then fine-tune a Qwen-3-14B model for strategy generation using reinforcement learning. Evaluation on the test split of the dataset demonstrates a 87% improvement in strategy derivation performance compared to the baseline. Furthermore, we integrate the fine-tuned Pen-Strategist model into existing automated pentesting frameworks, such as PentestGPT, and evaluate its performance on vulnerable machines, achieving a 47.5% improvement in subtask completion while surpassing the baseline GPT-5. Further experiments on the CTFKnow benchmark show an 18% performance gain over the base model. For step prediction, we train a semantic-based CNN classifier, which outperforms commercial LLMs by 28% and enhances execution stability. Finally, we conduct a user study to qualitatively assess the generated strategies, and Pen-Strategist demonstrates superior performance compared to the Claude-4.6-Sonnet.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Pentesting | PicoCTF | Success Rate (out of 5)60 | 33 | |
| Pentesting Strategy Generation | Pentesting Scenarios (test) | Strategy Success Rate73 | 11 | |
| Pentesting Explanation Generation | Pentesting Scenarios (test) | Explanation Score71 | 11 | |
| MCP Server Prediction | Pen-Strategist (test) | Accuracy48.88 | 10 | |
| Step Prediction | Pen-Strategist (test) | Accuracy82.87 | 10 | |
| Capture The Flag (CTF) | CTF Known | Web Score81.31 | 9 |