Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tool-use Planning on ToolBench Average over all sets

86.54Win Rate

GPT4 TOPGUN

45.418456.094266.7777.4458Feb 15, 2024
Updated 1mo ago

Evaluation Results

MethodLinks
2024.02
86.54
2024.02
84.61
2024.02
80.27
2024.02
79.44
2024.02
78.71
2024.02
78.59
2024.02
78.44
2024.02
70.4
2024.02
64.3
2024.02
64
2024.02
63.1
2024.02
60
2024.02
47