Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tool Use Evaluation on ShortcutsBench clear instructions 200 queries

92.5API Selection Accuracy

GPT-4o

83.1485.578890.43Aug 31, 2024
Updated 1mo ago

Evaluation Results

MethodLinks
2024.08
92.542.5
2024.08
91.553.5
2024.08
8953.5
2024.08
8946
2024.08
88.550
2024.08
8746
2024.08
8645
2024.08
83.546