Share your thoughts, 1 month free Claude Pro on usSee more

Sub-task Completion on AutoPenBench

212AC (Count)

Qwen3-32B-finetune

Updated 2mo ago

Evaluation Results

Method	Links
Qwen3-32B-finetune 2025.09		212	173	71	176	334	966
Llama3.1-405B 2025.09		107	116	40	75	186	524
Llama3.3-70B 2025.09		87	106	41	65	166	465
Llama3.1-405B 2025.09		61	66	22	44	67	260
Qwen3-32B 2025.09		60	70	10	70	155	365
Llama3.3-70B 2025.09		46	83	36	68	99	332
Llama3.1-405B 2025.09		27	40	15	43	56	181