Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Use Evaluation on ShortcutsBench clear instructions 200 queries
Loading...
92.5
API Selection Accuracy
GPT-4o
83.14
85.57
88
90.43
Aug 31, 2024
API Selection Accuracy
API Call Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
API Selection Accuracy
API Call Accuracy
GPT-4o
Framework=AwN
2024.08
92.5
42.5
DeepSeek V3
Framework=AwN
2024.08
91.5
53.5
DeepSeek V3
Framework=Base Model
2024.08
89
53.5
Claude 3.5
Framework=Base Model
2024.08
89
46
Claude 3.5
Framework=AwN
2024.08
88.5
50
Gemini 1.5
Framework=AwN
2024.08
87
46
GPT-4o
Framework=Base Model
2024.08
86
45
Gemini 1.5
Framework=Base Model
2024.08
83.5
46
Feedback
Search any
task
Search any
task