Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Use Accuracy on Seen Tools
Loading...
100
SRt
ChatHuman
47.792
61.346
74.9
88.454
May 7, 2024
SRt
SRact
SRargs
SR
mIoU
Updated 4d ago
Evaluation Results
Method
Method
Links
SRt
SRact
SRargs
SR
mIoU
ChatHuman
2024.05
100
97.3
95.1
96.6
97.4
ChatHuman
Backbone=LLaVA-1.5-7B
2024.05
100
97.4
95
97
97.5
Ours w/ GPT-4
LLM Backbone=GPT-4
2024.05
95.3
92
73.2
75.1
87.5
Visual ChatGPT-4
Model Version=GPT-4
2024.05
89.2
80.2
71.5
75.3
79.7
GPT4Tools-FT
Fine-tuned=true
2024.05
82.5
71
68.7
69
74.1
GPT4Tools
Fine-tuned=false
2024.05
60.9
54.7
52.5
52
56.6
Visual ChatGPT-3.5
Model Version=GPT-3.5
2024.05
49.8
31.9
23.7
25.1
79.1
Feedback
Search any
task
Search any
task