Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Search Agent Evaluation on BC-ZH
Loading...
66.6
Average Score
MiniMax-M2.1
-0.5008
16.9196
34.34
51.7604
May 12, 2026
Average Score
Updated 20d ago
Evaluation Results
Method
Method
Links
Average Score
MiniMax-M2.1
Model Category=Foundat...
2026.05
66.6
DeepSeek-V3.2
Model Category=Foundat...
2026.05
65
GPT-5 High
Model Category=Foundat...
2026.05
65
Qwen3-8B + ACTGUIDE-RL
Training=ACTGUIDE-RL
2026.05
26.64
Qwen3-8B
Training=Base
2026.05
23.52
Qwen3-8B + RL
Training=RL Baseline
2026.05
21.79
Qwen3-4B-Instruct + ACTGUIDE-RL
Training=ACTGUIDE-RL
2026.05
20.41
Qwen3-4B-Instruct + RL
Training=RL Baseline
2026.05
15.26
WebSailor-7B
Model Category=Search-...
2026.05
14.2
Qwen2.5-7B-Instruct + ACTGUIDE-RL
Training=ACTGUIDE-RL
2026.05
8.31
Qwen3-4B-Instruct
Training=Base
2026.05
7.96
WebThinker-32B-RL
Model Category=Search-...
2026.05
7.3
Qwen2.5-7B-Instruct + RL
Training=RL Baseline
2026.05
4.84
Qwen2.5-3B-Instruct + ACTGUIDE-RL
Training=ACTGUIDE-RL
2026.05
4.5
Qwen2.5-7B-Instruct
Training=Base
2026.05
4.5
Qwen2.5-3B-Instruct + RL
Training=RL Baseline
2026.05
2.42
Qwen2.5-3B-Instruct
Training=Base
2026.05
2.08
Feedback
Search any
task
Search any
task