Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-modal Agent Reasoning on GAIA (full)
Loading...
38.8
Pass@1
Reagent-U
19.248
24.324
29.4
34.476
Jan 29, 2026
Pass@1
Pass@3
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
Pass@3
Reagent-U
2026.01
38.8
53.9
MCP-R1
2026.01
37.6
51.5
Qwen3-8B
Model size=8B
2026.01
20
26.7
Feedback
Search any
task
Search any
task