Share your thoughts, 1 month free Claude Pro on usSee more

General AI Assistant tasks on GAIA 2

43.7Score

Claude-Sonnet-4.5

Updated 4mo ago

Evaluation Results

Method	Links
Claude-Sonnet-4.5 2025.12		43.7
GPT-5 2025.12		42.1
DeepSeek-V3.1-Nex-N1 2025.12		29.5
Gemini-2.5-pro 2025.12		25.8
Kimi-K2-thinking 2025.12		24.9
DeepSeek-V3.1 2025.12		21.9
Minimax-M2 2025.12		18.3
GLM-4.6 2025.12		17.1
Qwen3-32B-Nex-N1 2025.12		16.7
Qwen3-30B-A3B-Nex-N1 2025.12		11.3
Qwen3-32B 2025.12		10.8
InternLM3-8B-Nex-N1 2025.12		8.6
Qwen3-30B-A3B 2025.12		5.1
InternLM3-8B 2025.12		0.7