Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Deep Search on xbench-DS
Loading...
75
Accuracy
Qwen3-30B-A3B-thinking-SFT + SAPO
-0.92
18.79
38.5
58.21
May 28, 2026
Accuracy
Updated 5d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-30B-A3B-thinking-SFT + SAPO
Context Budget=128K
2026.05
75
Tongyi DeepResearch
Context Budget=128K
2026.05
69
Qwen3-30B-A3B-thinking-SFT + GRPO
Context Budget=128K
2026.05
67
OpenAI-o3
Context Budget=-
2026.05
66.7
Qwen3-30B-A3B-thinking-SFT
Context Budget=128K
2026.05
53
Kimi K2
Context Budget=-
2026.05
50
Web-30B-E-GRPO
Context Budget=-
2026.05
48.5
Qwen3-8B-SFT + SAPO
Context Budget=64K
2026.05
22
Qwen3-8B-SFT + GRPO
Context Budget=64K
2026.05
20
Qwen3-8B-SFT
Context Budget=64K
2026.05
18
Qwen3-8B-SFT + ARPO
Context Budget=64K
2026.05
16
MiroThinker-v1.0-8B
Context Budget=64K
2026.05
13.3
WebSailor-32B
Context Budget=32K
2026.05
11
WebSailor-7B
Context Budget=32K
2026.05
9.3
MiroThinker-v1.5-30B
Context Budget=128K
2026.05
5
WebExplorer-8B
Context Budget=64K
2026.05
2
Feedback
Search any
task
Search any
task