Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Web-based Agent QA on WebWalkerQA
Loading...
73.53
Pass@1
Agent KB
51.5028
57.2214
62.94
68.6586
Feb 8, 2026
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
Agent KB
Model Family=GPT-4.1,...
2026.02
73.53
TodoEvolve + Smolagents
Model Family=GPT-5-Min...
2026.02
73.53
Flash-Searcher
Model Family=GPT-5-min...
2026.02
71.18
TodoEvolve + Smolagents
Model Family=DeepSeek...
2026.02
70.59
Flash-Searcher
Model Family=DeepSeek...
2026.02
69.41
Agent KB
Model Family=GPT-4.1,...
2026.02
68.82
TodoEvolve + Smolagents
Model Family=Kimi K2,...
2026.02
64.71
Cognitive Kernel-Pro
Model Family=Claude-3....
2026.02
60.64
Agent KB
Model Family=GPT-4.1,...
2026.02
60.59
Smolagents
Model Family=GPT-5-min...
2026.02
58.82
OAgents
Model Family=Claude-3....
2026.02
58.23
OWL Workforce
Model Family=GPT-4o+o3...
2026.02
57.64
Flash-Searcher
Model Family=Kimi K2,...
2026.02
52.35
Feedback
Search any
task
Search any
task