Share your thoughts, 1 month free Claude Pro on usSee more

Agentic Dialogue on τ-Bench (test)

60.4Retail Accuracy

GPT-4o-2024-11-20

Updated 4mo ago

Evaluation Results

Method	Links
GPT-4o-2024-11-20 2025.08		60.4	42	51.2
Llama3.1-70B-Inst 2025.08		50.4	26	38.2
ToolACE-MT 2025.08		25.2	16	20.6
ToolACE-MT 2025.08		22.6	6	14.3
Multi-Agent Simulation 2025.08		21.7	10	15.9
ToolACE-MT 2025.08		9.5	6	7.8
Llama3.1-8B-Inst 2025.08		6.1	26	16.1