Share your thoughts, 1 month free Claude Pro on usSee more

Tool Learning on StableToolBench Average

70.3SoPR

GPT-4 (DFSDT)

Updated 4mo ago

Evaluation Results

Method	Links
GPT-4 (DFSDT) 2025.01		70.3	64.2
GPT-4 (Parallel) 2025.01		69.2	70.7
GPT-3.5 (DFSDT) 2025.01		66.7	65.5
DTA-Llama 2025.01		66.1	59.1
Qwen2.5 (Parallel) 2025.01		63	55.3
GPT-3.5 (Parallel) 2025.01		61.9	53
ToolLLAMA (DFSDT) 2025.01		54.2	47.1
GPT-4 (ReAct) 2025.01		48.2	58.7
GPT-3.5 (ReAct) 2025.01		47.9	-
ToolLLaMA† (DFSDT) 2025.01		39.2	37.6
ToolLLAMA (ReAct) 2025.01		37.9	39.3
LLMCompiler 2025.01		36.2	37.9
ToolLLaMA† (ReAct) 2025.01		25.3	27.3