Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
General AI Assistant tasks on GAIA 2
Loading...
43.7
Score
Claude-Sonnet-4.5
-1.02
10.59
22.2
33.81
Dec 4, 2025
Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Score
Claude-Sonnet-4.5
Model Type=Proprietary
2025.12
43.7
GPT-5
Model Type=Proprietary
2025.12
42.1
DeepSeek-V3.1-Nex-N1
Model Type=Open Source...
2025.12
29.5
Gemini-2.5-pro
Model Type=Proprietary
2025.12
25.8
Kimi-K2-thinking
Model Type=Open Source...
2025.12
24.9
DeepSeek-V3.1
Model Type=Open Source...
2025.12
21.9
Minimax-M2
Model Type=Open Source...
2025.12
18.3
GLM-4.6
Model Type=Open Source...
2025.12
17.1
Qwen3-32B-Nex-N1
Model Type=Open Source...
2025.12
16.7
Qwen3-30B-A3B-Nex-N1
Model Type=Open Source...
2025.12
11.3
Qwen3-32B
Model Type=Open Source...
2025.12
10.8
InternLM3-8B-Nex-N1
Model Type=Open Source...
2025.12
8.6
Qwen3-30B-A3B
Model Type=Open Source...
2025.12
5.1
InternLM3-8B
Model Type=Open Source...
2025.12
0.7
Feedback
Search any
task
Search any
task