Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WebWalkerQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Web-based Question AnsweringWebWalkerQA
Success Rate81.18
18
Web Browsing and NavigationWebWalkerQA
Average Accuracy71.7
18
Advanced Question AnsweringWebWalkerQA
Exact Match47.4
14
Deep ResearchWebWalkerQA original (test)
Pass@172.2
14
Web-based Agent QAWebWalkerQA
Pass@173.53
13
web-agent QAWebWalkerQA
F1 (Easy)11.4
8
Web-based Agent ReasoningWebWalkerQA Hard
Pass@30.6333
8
Web-based Agent ReasoningWebWalkerQA Medium
Pass@372.86
8
Web-based Agent ReasoningWebWalkerQA Easy
Pass@372.5
8
Question AnsweringWebWalkerQA (test)
EM46.3
6
Deep search QAWebwalkerQA
Accuracy23.01
6
Showing 11 of 11 rows