Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GAIA

Benchmarks

Task NameDataset NameSOTA ResultTrend
General AI Assistant TasksGAIA
Accuracy93.2
291
General AI Assistant TaskGAIA (val)
Level 1 Score96.23
97
General AI Assistant tasksGAIA
Avg Performance80.61
72
Deep searchgaia
Accuracy81.9
59
Agentic EvaluationGAIA
Accuracy28.12
50
General AI Assistant tasksGAIA
Pass@1 Score83.4
38
General AI Assistant TasksGAIA
Task Success Rate71.5
30
ReasoningGAIA text
Average Accuracy69.9
28
Web Task ReasoningGAIA (test)
Pass@184.5
25
Agentic BenchmarksGAIA
Execution Time (min)1.6
25
Deep SearchGAIA text-only (val)
Accuracy70.9
24
Deep researchGAIA
Accuracy78.2
24
General AI assistant tasksGAIA n=165 (dev)
Average Accuracy73.93
23
General AI Assistant TasksGAIA
Avg@8 Score88.5
22
Question AnsweringGAIA
Accuracy (Pass@4)51
22
General AI Assistant TaskGAIA
Accuracy62
21
Embodied AgenticGAIA
Accuracy0.672
21
Deep ResearchGAIA text-only original (test)
Pass@174.1
20
Complex ReasoningGAIA Text
Accuracy76.4
19
General AI AssistantGAIA text-only
Score81.9
19
General AI AssistantGAIA text
GAIA Average Score70.5
19
Long-Horizon Search IntelligenceGAIA
Pass@157.3
18
Multi-turn tool useGAIA
Pass@176.4
18
General AI Assistant ReasoningGAIA Full
Accuracy60.12
18
General AI Assistant ReasoningGAIA (File/Reasoning/Others)
Accuracy56.21
18
Showing 25 of 111 rows