Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Web-based Agent Reasoning on WebWalkerQA Hard
Loading...
0.6333
Pass@3
ExpSeek
0.454212
0.500706
0.5472
0.593694
Jan 13, 2026
Pass@3
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@3
ExpSeek
Base Model=Qwen3-8B
2026.01
0.6333
ExpSeek
Base Model=Qwen3-32B
2026.01
0.6333
No Experience
Base Model=Qwen3-32B
2026.01
0.5833
Training-Free GRPO
Base Model=Qwen3-32B
2026.01
0.5745
REASONINGBANK+
Base Model=Qwen3-32B
2026.01
0.5644
Training-Free GRPO
Base Model=Qwen3-8B
2026.01
0.5267
REASONINGBANK+
Base Model=Qwen3-8B
2026.01
0.5
No Experience
Base Model=Qwen3-8B
2026.01
0.4611
Feedback
Search any
task
Search any
task