Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis

About

Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While recent research has explored automating tasks such as penetration testing using LLM-based agents, existing frameworks often perform poorly due to limited capability in strategy formulation, domain-specific reasoning, and accurate action and tool selection. To overcome these limitations, we propose Pen-Strategist framework, consisting of a novel domain-specific reasoning model that derives pentesting strategies via logical reasoning and a classifier that converts the strategies into actionable steps. First, we construct a reasoning dataset containing logical explanations for both strategy derivation and step selection in pentesting scenarios. We then fine-tune a Qwen-3-14B model for strategy generation using reinforcement learning. Evaluation on the test split of the dataset demonstrates a 87% improvement in strategy derivation performance compared to the baseline. Furthermore, we integrate the fine-tuned Pen-Strategist model into existing automated pentesting frameworks, such as PentestGPT, and evaluate its performance on vulnerable machines, achieving a 47.5% improvement in subtask completion while surpassing the baseline GPT-5. Further experiments on the CTFKnow benchmark show an 18% performance gain over the base model. For step prediction, we train a semantic-based CNN classifier, which outperforms commercial LLMs by 28% and enhances execution stability. Finally, we conduct a user study to qualitatively assess the generated strategies, and Pen-Strategist demonstrates superior performance compared to the Claude-4.6-Sonnet.

Yasod Ginige, Pasindu Marasinghe, Sajal Jain, Suranga Seneviratne• 2026

Related benchmarks

Task	Dataset	Result
Pentesting	PicoCTF	Success Rate (out of 5)60	33
Pentesting Strategy Generation	Pentesting Scenarios (test)	Strategy Success Rate73	11
Pentesting Explanation Generation	Pentesting Scenarios (test)	Explanation Score71	11
MCP Server Prediction	Pen-Strategist (test)	Accuracy48.88	10
Step Prediction	Pen-Strategist (test)	Accuracy82.87	10
Capture The Flag (CTF)	CTF Known	Web Score81.31	9

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord