Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General AI Assistant Tasks on GAIA (test)
Loading...
63.11
Pass@3
ExpSeek
45.9396
50.3973
54.855
59.3127
Jan 13, 2026
Pass@3
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@3
ExpSeek
Base Model=Qwen3-32B
2026.01
63.11
Training-Free GRPO
Base Model=Qwen3-32B
2026.01
56.62
ExpSeek
Base Model=Qwen3-8B
2026.01
54.37
No Experience
Base Model=Qwen3-32B
2026.01
54.37
REASONINGBANK+
Base Model=Qwen3-8B
2026.01
48.7
Training-Free GRPO
Base Model=Qwen3-8B
2026.01
48.54
No Experience
Base Model=Qwen3-8B
2026.01
47.57
REASONINGBANK+
Base Model=Qwen3-32B
2026.01
46.6
Feedback
Search any
task
Search any
task