Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Tool Use on API-Bank (test)
Loading...
92.6
Accuracy
Claude 3.5 Sonnet
54.328
64.264
74.2
84.136
Jul 31, 2024
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Claude 3.5 Sonnet
evaluation_mode=zero-shot
2024.07
92.6
Llama 3 405B
evaluation_mode=zero-shot
2024.07
92.3
GPT-4o
evaluation_mode=zero-shot
2024.07
91.3
Llama 3 70B
evaluation_mode=zero-shot
2024.07
90
GPT-4
evaluation_mode=zero-shot
2024.07
89
Llama 3 8B
evaluation_mode=zero-shot
2024.07
82.6
Mixtral 8x22B
evaluation_mode=zero-shot
2024.07
73.1
GPT-3.5 Turbo
evaluation_mode=zero-shot
2024.07
60.9
Gemma 2 9B
evaluation_mode=zero-shot
2024.07
56.5
Mistral 7B
evaluation_mode=zero-shot
2024.07
55.8
Feedback
Search any
task
Search any
task