Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Assistant Tasks on GAIA L3
Loading...
28.7
Accuracy
ExComm
3.948
10.374
16.8
23.226
May 21, 2026
Accuracy
Updated 12d ago
Evaluation Results
Method
Method
Links
Accuracy
ExComm
Backbone=Qwen2.5-7B, N...
2026.05
28.7
Independent Scaling + SR
Backbone=Qwen2.5-7B, N...
2026.05
26.5
Sequential Revision (SR)
Backbone=Qwen2.5-7B, N...
2026.05
21
Independent Scaling
Backbone=Qwen2.5-7B, N...
2026.05
20.1
Tree Search
Backbone=Qwen2.5-7B, N...
2026.05
16.7
Base Agent
Backbone=Qwen2.5-7B, N...
2026.05
16.4
ExComm
Backbone=Gemini-2.5-Fl...
2026.05
14.8
Tree Search
Backbone=Gemini-2.5-Fl...
2026.05
10.2
Independent Scaling
Backbone=Gemini-2.5-Fl...
2026.05
9
Base Agent
Backbone=Gemini-2.5-Fl...
2026.05
5.9
Independent Scaling + SR
Backbone=Gemini-2.5-Fl...
2026.05
5.6
Sequential Revision (SR)
Backbone=Gemini-2.5-Fl...
2026.05
4.9
Feedback
Search any
task
Search any
task