Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
End-to-end Tool-use on ToolBench I1-Cat v1
Loading...
61.76
SoPR
ToolLlama*
35.9368
42.6409
49.345
56.0491
Jan 29, 2026
SoPR
SoWR
Updated 4d ago
Evaluation Results
Method
Method
Links
SoPR
SoWR
ToolLlama*
Setting=Retrieval
2026.01
61.76
50.98
ToolWeaver
Setting=Direct Generation
2026.01
57.41
43.14
ToolWeaver
2026.01
57.41
43.14
ToolGen
Setting=Direct Generation
2026.01
55.56
42.48
ToolGen
2026.01
55.56
42.48
GPT-3.5*
Setting=Retrieval
2026.01
53.05
54.25
GPT-4o-mini
Setting=Retrieval (Re.)
2026.01
50.11
-
GPT-4o-mini
Setting=Retrieval
2026.01
50.11
50.33
ToolGen*
2026.01
49.46
39.87
ToolLlama-2
Setting=Retrieval (Re.)
2026.01
36.93
27.45
ToolLlama-2
Setting=Retrieval
2026.01
36.93
27.45
Feedback
Search any
task
Search any
task