Share your thoughts, 1 month free Claude Pro on usSee more

Tool Learning on StableToolBench I3-Inst.

76SoPR

GPT-4 (DFSDT)

Updated 5mo ago

Evaluation Results

Method	Links
GPT-4 (DFSDT) 2025.01		76	63.9
GPT-3.5 (DFSDT) 2025.01		69.9	67.2
GPT-4 (Parallel) 2025.01		69.7	70.5
Qwen2.5 (Parallel) 2025.01		68.3	57.6
DTA-Llama 2025.01		67.5	59
GPT-3.5 (Parallel) 2025.01		56.6	50.8
ToolLLAMA (DFSDT) 2025.01		53.6	50.8
GPT-3.5 (ReAct) 2025.01		48.6	-
GPT-4 (ReAct) 2025.01		42.6	54.1
ToolLLaMA† (DFSDT) 2025.01		33.3	26.2
ToolLLAMA (ReAct) 2025.01		29.8	41
LLMCompiler 2025.01		27	36.5
ToolLLaMA† (ReAct) 2025.01		20.5	24.6