Share your thoughts, 1 month free Claude Pro on usSee more

Sub-task Completion on AI-Pentest-Benchmark Single Experiment

46AC Score

Qwen3-32B-finetune (Ours)

Updated 2mo ago

Evaluation Results

Method	Links
Qwen3-32B-finetune (Ours) 2025.09		46	38	15	38	114	251
Llama3.1-405B (VulnBot) 2025.09		31	30	11	18	55	145
Qwen3-32B (Base) 2025.09		26	28	11	22	79	166
Llama3.3-70B (VulnBot) 2025.09		25	24	12	15	49	125
Llama3.1-405B (Base) 2025.09		21	26	9	18	29	103
Llama3.1-405B (PentestGPT) 2025.09		20	18	6	12	28	84
Llama3.3-70B (Base) 2025.09		16	22	10	17	29	94