Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Web Navigation Question Answering on WebWalker QA (Pass@1)
Loading...
72.2
Pass@1
Tongyi DeepResearch
60.76
63.73
66.7
69.67
Oct 28, 2025
Pass@1
Updated 15d ago
Evaluation Results
Method
Method
Links
Pass@1
Tongyi DeepResearch
Agent Type=DeepResearc...
2025.10
72.2
OpenAI o3
Agent Type=LLM-based R...
2025.10
71.7
GLM 4.5
Agent Type=LLM-based R...
2025.10
65.6
Kimi K2
Agent Type=LLM-based R...
2025.10
63
Claude-4-Sonnet
Agent Type=LLM-based R...
2025.10
61.7
DeepSeek-V3.1
Agent Type=LLM-based R...
2025.10
61.2
Feedback
Search any
task
Search any
task