Share your thoughts, 1 month free Claude Pro on usSee more

Tool Learning on StableToolBench I1-Inst.

69SoPR

GPT-4 (DFSDT)

Updated 4mo ago

Evaluation Results

Method	Links
GPT-4 (DFSDT) 2025.01		69	57.1
Qwen2.5 (Parallel) 2025.01		65.7	54
GPT-3.5 (Parallel) 2025.01		64.6	48.5
GPT-3.5 (DFSDT) 2025.01		63.8	58.9
DTA-Llama 2025.01		63.5	52.1
GPT-4 (Parallel) 2025.01		62.9	66.3
ToolLLAMA (DFSDT) 2025.01		56.6	39.9
GPT-4 (ReAct) 2025.01		54.4	53.4
GPT-3.5 (ReAct) 2025.01		53	-
ToolLLAMA (ReAct) 2025.01		42.7	36.2
LLMCompiler 2025.01		39.2	35
ToolLLaMA† (DFSDT) 2025.01		31.8	35.6
ToolLLaMA† (ReAct) 2025.01		26.7	22.1