Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PentestAgent: Incorporating LLM Agents to Automated Penetration Testing

About

Penetration testing is a critical technique for identifying security vulnerabilities, traditionally performed manually by skilled security specialists. This complex process involves gathering information about the target system, identifying entry points, exploiting the system, and reporting findings. Despite its effectiveness, manual penetration testing is time-consuming and expensive, often requiring significant expertise and resources that many organizations cannot afford. While automated penetration testing methods have been proposed, they often fall short in real-world applications due to limitations in flexibility, adaptability, and implementation. Recent advancements in large language models (LLMs) offer new opportunities for enhancing penetration testing through increased intelligence and automation. However, current LLM-based approaches still face significant challenges, including limited penetration testing knowledge and a lack of comprehensive automation capabilities. To address these gaps, we propose PentestAgent, a novel LLM-based automated penetration testing framework that leverages the power of LLMs and various LLM-based techniques like Retrieval Augmented Generation (RAG) to enhance penetration testing knowledge and automate various tasks. Our framework leverages multi-agent collaboration to automate intelligence gathering, vulnerability analysis, and exploitation stages, reducing manual intervention. We evaluate PentestAgent using a comprehensive benchmark, demonstrating superior performance in task completion and overall efficiency. This work significantly advances the practical applicability of automated penetration testing systems.

Xiangmin Shen, Lingzhi Wang, Zhenyuan Li, Yan Chen, Wencheng Zhao, Dawei Sun, Jiashui Wang, Wei Ruan• 2024

Related benchmarks

TaskDatasetResultRank
Web security task completionXBOW 104 tasks
Task Completion Rate61
32
End-to-end penetration testingPentestGPT Ben 13 machines
Machines Rooted9
30
Active Directory domain escalationGOAD 5 hosts
Domain Escalation Success Rate40
30
Automated Exploit GenerationPenetration Testing Benchmark (test)
ASR (SI)46
28
Exploit GenerationAutomated Exploit Generation Benchmark
SI Score2
9
Automated Penetration TestingVulhub
Success Rate50
6
Automated Penetration TestingXBOW Level 2 (Medium)
Solved Count9
4
Automated Penetration TestingXBOW Level 3 (Hard)
Solved Count1
4
Automated Penetration TestingXBOW (Overall)
Solved Count25
4
Automated Penetration TestingXBOW Level 1 (Easy)
Solved Count15
4
Showing 10 of 10 rows

Other info

Follow for update