Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Learning on StableToolBench I1-Tool
Loading...
73.9
SoPR
GPT-3.5 (DFSDT)
23.044
36.247
49.45
62.653
Jan 21, 2025
SoPR
SoWR
Updated 4d ago
Evaluation Results
Method
Method
Links
SoPR
SoWR
GPT-3.5 (DFSDT)
Backbone=GPT-3.5, Reas...
2025.01
73.9
65.8
GPT-4 (DFSDT)
Backbone=GPT-4, Reason...
2025.01
69.6
66.5
GPT-4 (Parallel)
Backbone=GPT-4, Reason...
2025.01
67.4
61.4
GPT-3.5 (Parallel)
Backbone=GPT-3.5, Reas...
2025.01
65
55.7
DTA-Llama
Backbone=Llama-2-7B, R...
2025.01
64.2
53.2
Qwen2.5 (Parallel)
Backbone=Qwen2.5-7B-In...
2025.01
58.8
51
ToolLLAMA (DFSDT)
Backbone=Llama-2-7B, R...
2025.01
55.5
46.8
GPT-3.5 (ReAct)
Backbone=GPT-3.5, Reas...
2025.01
53
-
GPT-4 (ReAct)
Backbone=GPT-4, Reason...
2025.01
44.1
60.1
ToolLLaMA† (DFSDT)
Backbone=Llama-2-7B, R...
2025.01
39.9
37.3
ToolLLAMA (ReAct)
Backbone=Llama-2-7B, R...
2025.01
35.4
36.1
LLMCompiler
Backbone=Llama-2-7B, R...
2025.01
35.1
36
ToolLLaMA† (ReAct)
Backbone=Llama-2-7B, R...
2025.01
25
27.2
Feedback
Search any
task
Search any
task