Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WebWalker

Benchmarks

Task NameDataset NameSOTA ResultTrend
Web Navigation Question AnsweringWebWalker QA
Accuracy76.5
23
Long-context Memory Retrieval and ReasoningWebWalker 128K
F1 Score27.44
20
Knowledge-Intensive ReasoningWebWalker
F1 Score30.5
18
SearchWebWalker
Score59.5
7
Web SearchWebWalker
Pass@161.7
6
Deep ResearchWebWalker
F1 Score33.02
4
Showing 6 of 6 rows