Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Use on API-Bank (test)
Loading...
92.6
Accuracy
Claude 3.5 Sonnet
36.544
51.097
65.65
80.203
Jul 31, 2024
Nov 4, 2024
Feb 9, 2025
May 16, 2025
Aug 21, 2025
Nov 25, 2025
Mar 2, 2026
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Claude 3.5 Sonnet
evaluation_mode=zero-shot
2024.07
92.6
Llama 3 405B
evaluation_mode=zero-shot
2024.07
92.3
GPT-4o
evaluation_mode=zero-shot
2024.07
91.3
Llama 3 70B
evaluation_mode=zero-shot
2024.07
90
GPT-4
evaluation_mode=zero-shot
2024.07
89
Llama 3 8B
evaluation_mode=zero-shot
2024.07
82.6
Mixtral 8x22B
evaluation_mode=zero-shot
2024.07
73.1
ToolRLA
Backbone=Qwen3-14B
2026.03
71.8
GPT-4 (function calling)
2026.03
67.1
Bloomberg AI Engineering
2026.03
66.7
AvaTaR
2026.03
63.5
GPT-3.5 Turbo
evaluation_mode=zero-shot
2024.07
60.9
Gemma 2 9B
evaluation_mode=zero-shot
2024.07
56.5
Mistral 7B
evaluation_mode=zero-shot
2024.07
55.8
ToolLLM
2026.03
52.4
Gorilla
2026.03
38.7
Feedback
Search any
task
Search any
task