Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Learning on StableToolBench I3-Inst.
Loading...
76
SoPR
GPT-4 (DFSDT)
18.28
33.265
48.25
63.235
Jan 21, 2025
SoPR
SoWR
Updated 4d ago
Evaluation Results
Method
Method
Links
SoPR
SoWR
GPT-4 (DFSDT)
Backbone=GPT-4, Reason...
2025.01
76
63.9
GPT-3.5 (DFSDT)
Backbone=GPT-3.5, Reas...
2025.01
69.9
67.2
GPT-4 (Parallel)
Backbone=GPT-4, Reason...
2025.01
69.7
70.5
Qwen2.5 (Parallel)
Backbone=Qwen2.5-7B-In...
2025.01
68.3
57.6
DTA-Llama
Backbone=Llama-2-7B, R...
2025.01
67.5
59
GPT-3.5 (Parallel)
Backbone=GPT-3.5, Reas...
2025.01
56.6
50.8
ToolLLAMA (DFSDT)
Backbone=Llama-2-7B, R...
2025.01
53.6
50.8
GPT-3.5 (ReAct)
Backbone=GPT-3.5, Reas...
2025.01
48.6
-
GPT-4 (ReAct)
Backbone=GPT-4, Reason...
2025.01
42.6
54.1
ToolLLaMA† (DFSDT)
Backbone=Llama-2-7B, R...
2025.01
33.3
26.2
ToolLLAMA (ReAct)
Backbone=Llama-2-7B, R...
2025.01
29.8
41
LLMCompiler
Backbone=Llama-2-7B, R...
2025.01
27
36.5
ToolLLaMA† (ReAct)
Backbone=Llama-2-7B, R...
2025.01
20.5
24.6
Feedback
Search any
task
Search any
task