Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Search Agent Evaluation on XBench
Loading...
78
Average Score
DeepSeek-V3.2
5.2
24.1
43
61.9
May 12, 2026
Average Score
Updated 21d ago
Evaluation Results
Method
Method
Links
Average Score
DeepSeek-V3.2
Model Category=Foundat...
2026.05
78
GPT-5 High
Model Category=Foundat...
2026.05
77
MiniMax-M2.1
Model Category=Foundat...
2026.05
68
Qwen3-8B + ACTGUIDE-RL
Training=ACTGUIDE-RL
2026.05
44
Qwen3-4B-Instruct + ACTGUIDE-RL
Training=ACTGUIDE-RL
2026.05
37
WebSailor-7B
Model Category=Search-...
2026.05
34
Qwen3-8B + RL
Training=RL Baseline
2026.05
33
Qwen3-8B
Training=Base
2026.05
32
ARPO-8B
Model Category=Search-...
2026.05
25
WebThinker-32B-RL
Model Category=Search-...
2026.05
24
Qwen2.5-7B-Instruct + ACTGUIDE-RL
Training=ACTGUIDE-RL
2026.05
24
Qwen2.5-7B-Instruct + RL
Training=RL Baseline
2026.05
22
Qwen2.5-7B-Instruct
Training=Base
2026.05
19
Qwen3-4B-Instruct + RL
Training=RL Baseline
2026.05
18
Qwen2.5-3B-Instruct + ACTGUIDE-RL
Training=ACTGUIDE-RL
2026.05
16
Qwen3-4B-Instruct
Training=Base
2026.05
14
Qwen2.5-3B-Instruct + RL
Training=RL Baseline
2026.05
10
Qwen2.5-3B-Instruct
Training=Base
2026.05
8
Feedback
Search any
task
Search any
task