Share your thoughts, 1 month free Claude Pro on usSee more

Tool Learning on StableToolBench I1-Tool

73.9SoPR

GPT-3.5 (DFSDT)

Updated 4mo ago

Evaluation Results

Method	Links
GPT-3.5 (DFSDT) 2025.01		73.9	65.8
GPT-4 (DFSDT) 2025.01		69.6	66.5
GPT-4 (Parallel) 2025.01		67.4	61.4
GPT-3.5 (Parallel) 2025.01		65	55.7
DTA-Llama 2025.01		64.2	53.2
Qwen2.5 (Parallel) 2025.01		58.8	51
ToolLLAMA (DFSDT) 2025.01		55.5	46.8
GPT-3.5 (ReAct) 2025.01		53	-
GPT-4 (ReAct) 2025.01		44.1	60.1
ToolLLaMA† (DFSDT) 2025.01		39.9	37.3
ToolLLAMA (ReAct) 2025.01		35.4	36.1
LLMCompiler 2025.01		35.1	36
ToolLLaMA† (ReAct) 2025.01		25	27.2