Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Information-Seeking on XBench 2505 (full)
Loading...
75
pass@1
NestBrowse-30B-A3B
33.4
44.2
55
65.8
Dec 29, 2025
pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
pass@1
NestBrowse-30B-A3B
Web Toolkit=browser (t...
2025.12
75
NestBrowse-4B
Web Toolkit=browser (t...
2025.12
74
GLM-4.5-355B
Web Toolkit=not reported
2025.12
70
Kimi Researcher
Web Toolkit=browser (t...
2025.12
69
OpenAI-o3
Web Toolkit=browser
2025.12
66.7
WebLeaper-30B-A3B-RU
Web Toolkit=search, visit
2025.12
66
Claude-4-Sonnet
Web Toolkit=not reported
2025.12
64.6
WebSailor-V2-30B-A3B-SFT
Web Toolkit=search, visit
2025.12
61.7
WebSailor-72B
Web Toolkit=search, visit
2025.12
55
WebExplorer-8B
Web Toolkit=search, visit
2025.12
53.7
WebSailor-32B
Web Toolkit=search, visit
2025.12
53.3
DeepDiver-V2-38B
Web Toolkit=search
2025.12
53
DeepDive-32B
Web Toolkit=search, visit
2025.12
50.5
Kimi-K2-Instruct-1T
Web Toolkit=search, visit
2025.12
50
ASearcher-Web-32B
Web Toolkit=search, visit
2025.12
42.1
WebDancer-QwQ-32B
Web Toolkit=search, visit
2025.12
38.3
WebShaper-QwQ-32B
Web Toolkit=search, visit
2025.12
35
Feedback
Search any
task
Search any
task