Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Web Interaction on xbench DeepSearch 2510 (test)
Loading...
66
Pass@1
GPT-5
6.408
21.879
37.35
52.821
Apr 4, 2026
Pass@1
Pass@3
Updated 12d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@3
GPT-5
Agent Category=Proprie...
2026.04
66
-
DeepSeek-V3.2
Agent Category=Proprie...
2026.04
51
-
GLM-4.6
Agent Category=Proprie...
2026.04
47
-
DeepSeek-V3.1
Agent Category=Proprie...
2026.04
44
-
LThinker++
Agent Category=Our Age...
2026.04
44
60
Vanilla-Agent
Agent Category=Our Age...
2026.04
38.3
53
Claude-4-Sonnet
Agent Category=Proprie...
2026.04
35
-
Kimi-K2-Instruct
Agent Category=Proprie...
2026.04
30
-
Qwen3-235B-A22B-Instruct
Agent Category=Proprie...
2026.04
27
-
Qwen3-30B-A3B-Thinking
Agent Category=Our Age...
2026.04
8.7
16
Feedback
Search any
task
Search any
task