Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Learning on StableToolBench Average
Loading...
70.3
SoPR
GPT-4 (DFSDT)
23.5
35.65
47.8
59.95
Jan 21, 2025
SoPR
SoWR
Updated 4d ago
Evaluation Results
Method
Method
Links
SoPR
SoWR
GPT-4 (DFSDT)
Backbone=GPT-4, Reason...
2025.01
70.3
64.2
GPT-4 (Parallel)
Backbone=GPT-4, Reason...
2025.01
69.2
70.7
GPT-3.5 (DFSDT)
Backbone=GPT-3.5, Reas...
2025.01
66.7
65.5
DTA-Llama
Backbone=Llama-2-7B, R...
2025.01
66.1
59.1
Qwen2.5 (Parallel)
Backbone=Qwen2.5-7B-In...
2025.01
63
55.3
GPT-3.5 (Parallel)
Backbone=GPT-3.5, Reas...
2025.01
61.9
53
ToolLLAMA (DFSDT)
Backbone=Llama-2-7B, R...
2025.01
54.2
47.1
GPT-4 (ReAct)
Backbone=GPT-4, Reason...
2025.01
48.2
58.7
GPT-3.5 (ReAct)
Backbone=GPT-3.5, Reas...
2025.01
47.9
-
ToolLLaMA† (DFSDT)
Backbone=Llama-2-7B, R...
2025.01
39.2
37.6
ToolLLAMA (ReAct)
Backbone=Llama-2-7B, R...
2025.01
37.9
39.3
LLMCompiler
Backbone=Llama-2-7B, R...
2025.01
36.2
37.9
ToolLLaMA† (ReAct)
Backbone=Llama-2-7B, R...
2025.01
25.3
27.3
Feedback
Search any
task
Search any
task