Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Assistant Tasks on GAIA L2
Loading...
49.8
Accuracy
ExComm
20.368
28.009
35.65
43.291
May 21, 2026
Accuracy
Updated 12d ago
Evaluation Results
Method
Method
Links
Accuracy
ExComm
Backbone=Qwen2.5-7B, N...
2026.05
49.8
Independent Scaling
Backbone=Qwen2.5-7B, N...
2026.05
46.8
Tree Search
Backbone=Qwen2.5-7B, N...
2026.05
45.3
Independent Scaling + SR
Backbone=Qwen2.5-7B, N...
2026.05
43.7
Base Agent
Backbone=Qwen2.5-7B, N...
2026.05
38.8
Sequential Revision (SR)
Backbone=Qwen2.5-7B, N...
2026.05
36.7
ExComm
Backbone=Gemini-2.5-Fl...
2026.05
30.8
Tree Search
Backbone=Gemini-2.5-Fl...
2026.05
30
Independent Scaling + SR
Backbone=Gemini-2.5-Fl...
2026.05
29.2
Independent Scaling
Backbone=Gemini-2.5-Fl...
2026.05
28.5
Sequential Revision (SR)
Backbone=Gemini-2.5-Fl...
2026.05
21.8
Base Agent
Backbone=Gemini-2.5-Fl...
2026.05
21.5
Feedback
Search any
task
Search any
task