Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-horizon agentic tasks on HLE Our Settings
Loading...
44.4
Pass@1
AgentSwing
31.192
34.621
38.05
41.479
Mar 29, 2026
Pass@1
Updated 19d ago
Evaluation Results
Method
Method
Links
Pass@1
AgentSwing
Base Model=DeepSeek-v3.2
2026.03
44.4
DeepSeek-v3.2
Context Management=Sum...
2026.03
43.5
DeepSeek-v3.2
Context Management=Dis...
2026.03
42
DeepSeek-v3.2
Context Management=Bas...
2026.03
40.2
DeepSeek-v3.2
Context Management=Kee...
2026.03
39.6
AgentSwing
Base Model=GPT-OSS-120B
2026.03
35.1
GPT-OSS-120B
Context Management=Sum...
2026.03
34.4
GPT-OSS-120B
Context Management=Dis...
2026.03
34.2
GPT-OSS-120B
Context Management=Kee...
2026.03
34.1
GPT-OSS-120B
Context Management=Bas...
2026.03
33.2
AgentSwing
Base Model=Tongyi-DR-3...
2026.03
33.1
Tongyi-DR-30B-A3B
Context Management=Dis...
2026.03
32.7
Tongyi-DR-30B-A3B
Context Management=Kee...
2026.03
32.2
Tongyi-DR-30B-A3B
Context Management=Sum...
2026.03
32
Tongyi-DR-30B-A3B
Context Management=Bas...
2026.03
31.7
Feedback
Search any
task
Search any
task