Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Pentesting Strategy Generation on Pentesting Scenarios (test)
Loading...
73
Strategy Success Rate
Qwen-3-14B-GRPO
34.52
44.51
54.5
64.49
May 6, 2026
Strategy Success Rate
Updated 27d ago
Evaluation Results
Method
Method
Links
Strategy Success Rate
Qwen-3-14B-GRPO
Parameters=14B, Optimi...
2026.05
73
Claude 4.5 Sonnet
2026.05
65
GPT-5
2026.05
62
GPT 4.1
2026.05
60
Claude 3 Haiku
2026.05
54
GPT-4o-mini
2026.05
52
Gemini 2.5 Flash
2026.05
45
Gemini 2.0 Flash
2026.05
40
LLaMA-3.1-8B
Parameters=8B
2026.05
40
Qwen-3-14B
Parameters=14B
2026.05
39
GPT-3.5-turbo
2026.05
36
Feedback
Search any
task
Search any
task