Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General AI Assistants Evaluation on GAIA n=50
Loading...
0.5944
Accuracy
GPT-5 search
0.111112
0.236581
0.36205
0.487519
Dec 7, 2025
Accuracy
95% CI
Between-Rater Variance
ICC
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
95% CI
Between-Rater Variance
ICC
GPT-5 search
Web search=true, Trial...
2025.12
0.5944
-
0.183
0.745
Claude 4.5 Sonnet
Web search=true, Trial...
2025.12
0.3971
-
0.184
0.756
Claude 4.5 Haiku
Web search=true, Trial...
2025.12
0.3226
-
0.164
0.738
Gemini 2.5 Pro
Web search=false, Tria...
2025.12
0.3128
-
0.18
0.772
Deepseek-v3p1
Web search=false, Tria...
2025.12
0.2228
-
0.123
0.699
GPT-4o search
Web search=true, Trial...
2025.12
0.2159
-
0.115
0.671
Qwen3-235b-a22b
Web search=false, Tria...
2025.12
0.1297
-
0.085
0.736
Feedback
Search any
task
Search any
task