Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis

About

Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While recent research has explored automating tasks such as penetration testing using LLM-based agents, existing frameworks often perform poorly due to limited capability in strategy formulation, domain-specific reasoning, and accurate action and tool selection. To overcome these limitations, we propose Pen-Strategist framework, consisting of a novel domain-specific reasoning model that derives pentesting strategies via logical reasoning and a classifier that converts the strategies into actionable steps. First, we construct a reasoning dataset containing logical explanations for both strategy derivation and step selection in pentesting scenarios. We then fine-tune a Qwen-3-14B model for strategy generation using reinforcement learning. Evaluation on the test split of the dataset demonstrates a 87% improvement in strategy derivation performance compared to the baseline. Furthermore, we integrate the fine-tuned Pen-Strategist model into existing automated pentesting frameworks, such as PentestGPT, and evaluate its performance on vulnerable machines, achieving a 47.5% improvement in subtask completion while surpassing the baseline GPT-5. Further experiments on the CTFKnow benchmark show an 18% performance gain over the base model. For step prediction, we train a semantic-based CNN classifier, which outperforms commercial LLMs by 28% and enhances execution stability. Finally, we conduct a user study to qualitatively assess the generated strategies, and Pen-Strategist demonstrates superior performance compared to the Claude-4.6-Sonnet.

Yasod Ginige, Pasindu Marasinghe, Sajal Jain, Suranga Seneviratne• 2026

Related benchmarks

TaskDatasetResultRank
PentestingPicoCTF
Success Rate (out of 5)60
33
Pentesting Strategy GenerationPentesting Scenarios (test)
Strategy Success Rate73
11
Pentesting Explanation GenerationPentesting Scenarios (test)
Explanation Score71
11
MCP Server PredictionPen-Strategist (test)
Accuracy48.88
10
Step PredictionPen-Strategist (test)
Accuracy82.87
10
Capture The Flag (CTF)CTF Known
Web Score81.31
9
Showing 6 of 6 rows

Other info

Follow for update