Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Browsing and Task Completion on BrowseComp
Loading...
58.3
Pass@1 Rate
Tongyi DeepResearch
10.356
22.803
35.25
47.697
Oct 28, 2025
Pass@1 Rate
Updated 15d ago
Evaluation Results
Method
Method
Links
Pass@1 Rate
Tongyi DeepResearch
Agent Type=DeepResearc...
2025.10
58.3
OpenAI DeepResearch
Agent Type=DeepResearc...
2025.10
51.5
OpenAI o3
Agent Type=LLM-based R...
2025.10
49.7
Tongyi DeepResearch
Agent Type=DeepResearc...
2025.10
43.4
DeepSeek-V3.1
Agent Type=LLM-based R...
2025.10
30
OpenAI o4-mini
Agent Type=LLM-based R...
2025.10
28.3
GLM 4.5
Agent Type=LLM-based R...
2025.10
26.4
Kimi K2
Agent Type=LLM-based R...
2025.10
14.1
Claude-4-Sonnet
Agent Type=LLM-based R...
2025.10
12.2
Feedback
Search any
task
Search any
task