Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Automated Penetration Testing on XBOW Level 2 (Medium)
Loading...
19
Solved Count
Red-MIRROR
-0.76
4.37
9.5
14.63
Mar 28, 2026
Solved Count
Success Rate
Updated 2mo ago
Evaluation Results
Method
Method
Links
Solved Count
Success Rate
Red-MIRROR
LLM=DeepSeek-V3.2, Tim...
2026.03
19
82.61
PentestAgent
LLM=DeepSeek-V3.2, Tim...
2026.03
9
39.13
AutoPT
LLM=DeepSeek-V3.2, Tim...
2026.03
6
26.09
VulnBot
LLM=DeepSeek-V3.2, Tim...
2026.03
0
0
Feedback
Search any
task
Search any
task