Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Automated Penetration Testing on XBOW Level 3 (Hard)
Loading...
3
Solved Count
Red-MIRROR
-0.12
0.69
1.5
2.31
Mar 28, 2026
Solved Count
Success Rate (%)
Updated 2mo ago
Evaluation Results
Method
Method
Links
Solved Count
Success Rate (%)
Red-MIRROR
LLM=DeepSeek-V3.2, Tim...
2026.03
3
60
PentestAgent
LLM=DeepSeek-V3.2, Tim...
2026.03
1
20
AutoPT
LLM=DeepSeek-V3.2, Tim...
2026.03
1
20
VulnBot
LLM=DeepSeek-V3.2, Tim...
2026.03
0
0
Feedback
Search any
task
Search any
task