Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Deep Research on xbench-DS
Loading...
71
Pass@1
DeepSeek-V3.1
1.32
19.41
37.5
55.59
Oct 28, 2025
Pass@1
Pass@3
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@3
DeepSeek-V3.1
Model Category=Advance...
2025.10
71
-
OpenAI-o3
Model Category=Advance...
2025.10
66.7
-
Claude-4-Sonnet
Model Category=Advance...
2025.10
64.6
-
WebSailor
Model Size=32B
2025.10
53.3
-
Kimi-K2
Model Category=Advance...
2025.10
50
-
Web-30B-E-GRPO
Model Size=30B, Traini...
2025.10
46.7
66
Web-30B-GRPO
Model Size=30B, Traini...
2025.10
45.3
65
Web-30B-SFT
Model Size=30B, Traini...
2025.10
43.7
63
Web-7B-E-GRPO
Model Size=7B, Trainin...
2025.10
42
59
Web-7B-GRPO
Model Size=7B, Trainin...
2025.10
40.7
56
WebDancer-QwQ
2025.10
39
-
Web-7B-SFT
Model Size=7B, Trainin...
2025.10
37.3
55
WebSailor
Model Size=7B
2025.10
34.3
-
WebThinker-RL
2025.10
24
-
R1-Searcher
Model Size=7B
2025.10
4
-
Feedback
Search any
task
Search any
task