Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Reasoning on GAIA Text
Loading...
18.4
Accuracy
OctoTools
8
10.7
13.4
16.1
Feb 16, 2025
Accuracy
Delta (0-shot)
Delta (CoT)
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Delta (0-shot)
Delta (CoT)
OctoTools
Backbone=gpt-4o-2024-0...
2025.02
18.4
9.7
10
OctoToolsbase
Backbone=gpt-4o-2024-0...
2025.02
9.7
-
-
0-shot
Backbone=gpt-4o-2024-0...
2025.02
8.7
-
-
CoT
Backbone=gpt-4o-2024-0...
2025.02
8.4
-
-
Feedback
Search any
task
Search any
task