Share your thoughts, 1 month free Claude Pro on usSee more

Long-horizon agentic tasks on BrowseComp Our Settings

62.5Pass@1

AgentSwing

Updated 3mo ago

Evaluation Results

Method	Links
AgentSwing 2026.03		62.5
AgentSwing 2026.03		60.5
AgentSwing 2026.03		60
DeepSeek-v3.2 2026.03		58
Tongyi-DR-30B-A3B 2026.03		58
Tongyi-DR-30B-A3B 2026.03		55
Tongyi-DR-30B-A3B 2026.03		53
GPT-OSS-120B 2026.03		52.5
DeepSeek-v3.2 2026.03		52
GPT-OSS-120B 2026.03		50.5
DeepSeek-v3.2 2026.03		48.5
GPT-OSS-120B 2026.03		48
Tongyi-DR-30B-A3B 2026.03		48
DeepSeek-v3.2 2026.03		43.5
GPT-OSS-120B 2026.03		39.5