Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General AI Assistant Reasoning on GAIA
Loading...
67.4
Pass@1 Accuracy
OPENAI DEEPRESEARCH
11.448
25.974
40.5
55.026
Mar 7, 2026
Pass@1 Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1 Accuracy
OPENAI DEEPRESEARCH
Category=CLOSED-SOURCE
2026.03
67.4
MIRO-30B + WEDAS
Category=OPEN-SOURCE
2026.03
66.99
MIRO-30B + MIROFLOW
Category=OPEN-SOURCE
2026.03
63.11
GPT-5-MINI + WEDAS
Category=OPEN-SOURCE
2026.03
57.28
WEBSAILOR-72B
Category=OPEN-SOURCE
2026.03
55.4
WEBSAILOR-32B
Category=OPEN-SOURCE
2026.03
53.2
ASEARCHER-WEB-32B
Category=OPEN-SOURCE
2026.03
52.8
WEBDANCER-QWQ-32B
Category=OPEN-SOURCE
2026.03
51.5
GPT-5-MINI + MIROFLOW
Category=OPEN-SOURCE
2026.03
51.46
WEBTHINKER-32B-RL
Category=OPEN-SOURCE
2026.03
48.5
SEARCH-O1-32B
Category=OPEN-SOURCE
2026.03
39.8
O4-MINI
Category=DIRECT INFERENCE
2026.03
33.3
GPT-4.1
Category=DIRECT INFERENCE
2026.03
22.3
QWQ-32B
Category=DIRECT INFERENCE
2026.03
22.3
GPT-4O
Category=DIRECT INFERENCE
2026.03
17.5
QWEN-2.5-72B
Category=DIRECT INFERENCE
2026.03
14.6
QWEN-2.5-32B
Category=DIRECT INFERENCE
2026.03
13.6
Feedback
Search any
task
Search any
task