Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Assistant Tasks on GAIA L1
Loading...
69.5
Accuracy
ExComm
33.828
43.089
52.35
61.611
May 21, 2026
Accuracy
Updated 12d ago
Evaluation Results
Method
Method
Links
Accuracy
ExComm
Backbone=Qwen2.5-7B, N...
2026.05
69.5
Independent Scaling
Backbone=Qwen2.5-7B, N...
2026.05
61.7
Independent Scaling + SR
Backbone=Qwen2.5-7B, N...
2026.05
60.8
Tree Search
Backbone=Qwen2.5-7B, N...
2026.05
58.7
Sequential Revision (SR)
Backbone=Qwen2.5-7B, N...
2026.05
54
Base Agent
Backbone=Qwen2.5-7B, N...
2026.05
53.7
ExComm
Backbone=Gemini-2.5-Fl...
2026.05
52.2
Independent Scaling
Backbone=Gemini-2.5-Fl...
2026.05
44
Independent Scaling + SR
Backbone=Gemini-2.5-Fl...
2026.05
42.7
Tree Search
Backbone=Gemini-2.5-Fl...
2026.05
41
Base Agent
Backbone=Gemini-2.5-Fl...
2026.05
35.8
Sequential Revision (SR)
Backbone=Gemini-2.5-Fl...
2026.05
35.2
Feedback
Search any
task
Search any
task