Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Learning on StableToolBench I1-Inst.
Loading...
69
SoPR
GPT-4 (DFSDT)
25.008
36.429
47.85
59.271
Jan 21, 2025
SoPR
SoWR
Updated 4d ago
Evaluation Results
Method
Method
Links
SoPR
SoWR
GPT-4 (DFSDT)
Backbone=GPT-4, Reason...
2025.01
69
57.1
Qwen2.5 (Parallel)
Backbone=Qwen2.5-7B-In...
2025.01
65.7
54
GPT-3.5 (Parallel)
Backbone=GPT-3.5, Reas...
2025.01
64.6
48.5
GPT-3.5 (DFSDT)
Backbone=GPT-3.5, Reas...
2025.01
63.8
58.9
DTA-Llama
Backbone=Llama-2-7B, R...
2025.01
63.5
52.1
GPT-4 (Parallel)
Backbone=GPT-4, Reason...
2025.01
62.9
66.3
ToolLLAMA (DFSDT)
Backbone=Llama-2-7B, R...
2025.01
56.6
39.9
GPT-4 (ReAct)
Backbone=GPT-4, Reason...
2025.01
54.4
53.4
GPT-3.5 (ReAct)
Backbone=GPT-3.5, Reas...
2025.01
53
-
ToolLLAMA (ReAct)
Backbone=Llama-2-7B, R...
2025.01
42.7
36.2
LLMCompiler
Backbone=Llama-2-7B, R...
2025.01
39.2
35
ToolLLaMA† (DFSDT)
Backbone=Llama-2-7B, R...
2025.01
31.8
35.6
ToolLLaMA† (ReAct)
Backbone=Llama-2-7B, R...
2025.01
26.7
22.1
Feedback
Search any
task
Search any
task