Share your thoughts, 1 month free Claude Pro on usSee more

Long-horizon agentic tasks on HLE Our Settings

44.4Pass@1

AgentSwing

Updated 3mo ago

Evaluation Results

Method	Links
AgentSwing 2026.03		44.4
DeepSeek-v3.2 2026.03		43.5
DeepSeek-v3.2 2026.03		42
DeepSeek-v3.2 2026.03		40.2
DeepSeek-v3.2 2026.03		39.6
AgentSwing 2026.03		35.1
GPT-OSS-120B 2026.03		34.4
GPT-OSS-120B 2026.03		34.2
GPT-OSS-120B 2026.03		34.1
GPT-OSS-120B 2026.03		33.2
AgentSwing 2026.03		33.1
Tongyi-DR-30B-A3B 2026.03		32.7
Tongyi-DR-30B-A3B 2026.03		32.2
Tongyi-DR-30B-A3B 2026.03		32
Tongyi-DR-30B-A3B 2026.03		31.7