Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Learning on StableToolBench I2-Cat.
Loading...
71.9
SoPR
DTA-Llama
22.604
35.402
48.2
60.998
Jan 21, 2025
SoPR
SoWR
Updated 4d ago
Evaluation Results
Method
Method
Links
SoPR
SoWR
DTA-Llama
Backbone=Llama-2-7B, R...
2025.01
71.9
65.3
GPT-4 (Parallel)
Backbone=GPT-4, Reason...
2025.01
70.8
77.4
GPT-3.5 (DFSDT)
Backbone=GPT-3.5, Reas...
2025.01
69.8
68.5
GPT-4 (DFSDT)
Backbone=GPT-4, Reason...
2025.01
68
62.9
GPT-3.5 (Parallel)
Backbone=GPT-3.5, Reas...
2025.01
61.4
53.2
Qwen2.5 (Parallel)
Backbone=Qwen2.5-7B-In...
2025.01
61.3
61.3
ToolLLAMA (DFSDT)
Backbone=Llama-2-7B, R...
2025.01
53.4
49.2
GPT-4 (ReAct)
Backbone=GPT-4, Reason...
2025.01
48.9
62.1
GPT-3.5 (ReAct)
Backbone=GPT-3.5, Reas...
2025.01
43.9
-
ToolLLAMA (ReAct)
Backbone=Llama-2-7B, R...
2025.01
40.9
38.7
ToolLLaMA† (DFSDT)
Backbone=Llama-2-7B, R...
2025.01
39.1
39.5
LLMCompiler
Backbone=Llama-2-7B, R...
2025.01
38.4
38.1
ToolLLaMA† (ReAct)
Backbone=Llama-2-7B, R...
2025.01
24.5
28.2
Feedback
Search any
task
Search any
task