Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Tool-use and Reasoning on SEAL (test)
Loading...
51.97
Pass@3
ExpSeek
39.2716
42.5683
45.865
49.1617
Jan 13, 2026
Pass@3
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@3
ExpSeek
Base Model=Qwen3-32B
2026.01
51.97
ExpSeek
Base Model=Qwen3-8B
2026.01
50
REASONINGBANK+
Base Model=Qwen3-32B
2026.01
48.82
Training-Free GRPO
Base Model=Qwen3-32B
2026.01
48.35
No Experience
Base Model=Qwen3-32B
2026.01
47.64
REASONINGBANK+
Base Model=Qwen3-8B
2026.01
44.09
Training-Free GRPO
Base Model=Qwen3-8B
2026.01
42.13
No Experience
Base Model=Qwen3-8B
2026.01
39.76
Feedback
Search any
task
Search any
task