Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Web Search on xbench
Loading...
66
Average Score
OpenAI-o3
6.72
22.11
37.5
52.89
Jan 29, 2026
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
OpenAI-o3
2026.01
66
Claude-4-Sonnet
2026.01
64
Reagent-U
Backbone=Qwen3-8B
2026.01
43
Reagent-R
Backbone=Qwen3-8B
2026.01
41
Search-o1
Backbone=QwQ-32B-Preview
2026.01
40
WebDancer
Backbone=Qwen2.5-32B
2026.01
38
DeepSeek-R1-671B
Backbone=DeepSeek-R1-671B
2026.01
32
ARPO
Backbone=Qwen3-14B
2026.01
32
Reagent w/o Agent-RRM
Backbone=Qwen3-8B
2026.01
32
ARPO
Backbone=Qwen3-8B
2026.01
25
Atom-Searcher
Backbone=Qwen2.5-7B
2026.01
21
Reagent-C
Backbone=Qwen3-8B, Inf...
2026.01
15
WebThinker
Backbone=Qwen3-8B
2026.01
13
QwQ-32B
Backbone=QwQ-32B
2026.01
10
Qwen3-8B
Backbone=Qwen3-8B
2026.01
9
Feedback
Search any
task
Search any
task