Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Reasoning on xbench (test)
Loading...
0.66
Pass@3
ExpSeek
0.4208
0.4829
0.545
0.6071
Jan 13, 2026
Pass@3
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@3
ExpSeek
Base Model=Qwen3-32B
2026.01
0.66
ExpSeek
Base Model=Qwen3-8B
2026.01
0.62
REASONINGBANK+
Base Model=Qwen3-32B
2026.01
0.59
Training-Free GRPO
Base Model=Qwen3-32B
2026.01
0.536
No Experience
Base Model=Qwen3-32B
2026.01
0.53
REASONINGBANK+
Base Model=Qwen3-8B
2026.01
0.468
Training-Free GRPO
Base Model=Qwen3-8B
2026.01
0.45
No Experience
Base Model=Qwen3-8B
2026.01
0.43
Feedback
Search any
task
Search any
task