Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent Planning and API Calling on ToolBench Out of the domain

83.9Plan Accuracy

GPT-4o

26.38841.31956.2571.181Jun 24, 2025
Updated 17d ago

Evaluation Results

MethodLinks
2025.06
83.963.242.922.344.199.859.3
77.256.13221.938.210054.2
74.65228.320.435.699.851.8
2025.06
73.457.531.13748.996.957.5
2025.06
72.35732.336.148.796.357.1
2025.06
59.741.119.725.936.597.346.7
2025.06
43.916.67.26.212.399.931
2025.06
28.6000010021.4