Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GAIA

Benchmarks

Task NameDataset NameSOTA ResultTrend
General AI Assistant TasksGAIA
Accuracy87.4
274
General AI Assistant tasksGAIA
Avg Performance80.61
54
Agentic EvaluationGAIA
Accuracy28.12
50
Deep searchgaia
Accuracy81.9
43
General AI Assistant TaskGAIA (val)
Level 1 Score94.3
43
General AI Assistant tasksGAIA
Pass@1 Score70.5
26
Agentic BenchmarksGAIA
Execution Time (min)1.6
25
Deep SearchGAIA text-only (val)
Accuracy70.9
24
Deep researchGAIA
Accuracy78.2
24
Embodied AgenticGAIA
Accuracy0.672
21
Deep ResearchGAIA text-only original (test)
Pass@174.1
20
General AI AssistantGAIA text-only
Score81.9
19
General AI AssistantGAIA text
GAIA Average Score70.5
19
Long-Horizon Search IntelligenceGAIA
Pass@157.3
18
Multi-turn tool useGAIA
Pass@176.4
18
Question AnsweringGAIA
Accuracy (Pass@4)51
18
General AI Assistant ReasoningGAIA Full
Accuracy60.12
18
General AI Assistant ReasoningGAIA (File/Reasoning/Others)
Accuracy56.21
18
General AI Assistant ReasoningGAIA (Web)
Accuracy63.33
18
General AI Assistant ReasoningGAIA
Pass@1 Accuracy67.4
17
Agentic ReasoningGAIA (val)
Average Score86.06
17
Inference Time ConsumptionGAIA
Latency (Research And Data)11.2
16
Information-SeekingGAIA 103-question text-only
Pass@175.7
16
Deep ResearchGAIA
Pass@151.46
16
Deep ResearchGAIA
Pass@170.5
15
Showing 25 of 83 rows