Share your thoughts, 1 month free Claude Pro on usSee more

Agent Performance on ACEBench-en

56End-to-End Accuracy

GPT-4o-2024-11-20

Updated 5mo ago

Evaluation Results

Method	Links
GPT-4o-2024-11-20 2025.08		56	-	77.8
Llama3.1-70B-Inst 2025.08		41	-	62.5
ToolACE-MT 2025.08		8.4	-	34
Llama3.1-8B-Inst 2025.08		6.7	-	18.3
Multi-Agent Simulation 2025.08		6.7	-	15
ToolACE-MT 2025.08		1.7	-	28.5
ToolACE-MT 2025.08		1.7	-	22.8
DS V3.2-Thinking 2026.02		-	81.4	-
Gemini 2.5-Pro 2026.02		-	80.9	-
GPT-5 (High) 2026.02		-	79.3	-
Gemini 3-Pro 2026.02		-	80.9	-
ERNIE 5.0 2026.02		-	87.7	-