Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WebWalker

Benchmarks

Task NameDataset NameSOTA ResultTrend
Web Navigation Question AnsweringWebWalker QA
Accuracy76.5
23
Long-context Memory Retrieval and ReasoningWebWalker 128K
F1 Score27.44
20
Knowledge-Intensive ReasoningWebWalker
F1 Score30.5
18
Web-based Agent Task CompletionWebWalker
Success Rate (Config)53.5
10
DeepSearchWebWalker
Success Rate47.2
9
Agentic SearchWebWalker
Accuracy72.7
9
SearchWebWalker
Score59.5
7
Web SearchWebWalker
Pass@161.7
6
Web Browsing and NavigationWebWalker
Score39.85
5
Web NavigationWebWalker 100 tasks (test)
Success Rate (Easy)0.125
4
Deep ResearchWebWalker
F1 Score33.02
4
Showing 11 of 11 rows