Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ACEBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent PerformanceACEBench Agent
Agent Score78
36
Multi-turn agent taskACEBench multi-turn (test)
Process Accuracy76.5
15
Agentic PerformanceACEBench Agent
End-to-End Accuracy60
15
Cross-Lingual PlanningACEBench
Score (En)78.3
14
Agent Capability EvaluationACEBench Agent
Multi-Step Reasoning Score95
13
Agentic Tool-useACEBench (agent-task)
Multi Turn Success Rate97.5
13
Function CallingACEBench Normal
Accuracy75.6
13
Function CallingACEBench Normal (test)
Summary Score53
11
Tool UseACEBench-en (out-of-distribution)
Normal Score77.9
8
Multi-turn DialogueACEBench En
MT Accuracy68
7
Agentic PerformanceACEBench-en
End-to-End Accuracy56
7
Agentic PerformanceACEBench-zh
Accuracy89.6
5
Showing 12 of 12 rows