Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sub-task Completion on AutoPenBench
Loading...
212
AC (Count)
Qwen3-32B-finetune
19.6
69.55
119.5
169.45
Sep 16, 2025
AC (Count)
WS (Count)
NS (Count)
CRPT (Count)
Real-world Success (Count)
ALL (Overall Success Count)
Updated 1mo ago
Evaluation Results
Method
Method
Links
AC (Count)
WS (Count)
NS (Count)
CRPT (Count)
Real-world Success (Count)
ALL (Overall Success Count)
Qwen3-32B-finetune
Framework=xOffense, No...
2025.09
212
173
71
176
334
966
Llama3.1-405B
Framework=VulnBot
2025.09
107
116
40
75
186
524
Llama3.3-70B
Framework=VulnBot
2025.09
87
106
41
65
166
465
Llama3.1-405B
Framework=Base
2025.09
61
66
22
44
67
260
Qwen3-32B
Framework=Base
2025.09
60
70
10
70
155
365
Llama3.3-70B
Framework=Base
2025.09
46
83
36
68
99
332
Llama3.1-405B
Framework=PentestGPT
2025.09
27
40
15
43
56
181
Feedback
Search any
task
Search any
task