| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Agent Performance | ACEBench Agent | Agent Score78 | 36 | |
| Tool-calling | ACEBench Extended Setting | Overall Score65.17 | 18 | |
| Tool-calling | ACEBench Standard Setting | Overall Score68.92 | 18 | |
| Tool Use | ACEBench Parallel | Accuracy81 | 15 | |
| Tool Use | ACEBench Single | Accuracy90 | 15 | |
| Multi-turn agent task | ACEBench multi-turn (test) | Process Accuracy76.5 | 15 | |
| Agentic Performance | ACEBench Agent | End-to-End Accuracy60 | 15 | |
| Cross-Lingual Planning | ACEBench | Score (En)78.3 | 14 | |
| Agent Capability Evaluation | ACEBench Agent | Multi-Step Reasoning Score95 | 13 | |
| Agentic Tool-use | ACEBench (agent-task) | Multi Turn Success Rate97.5 | 13 | |
| Function Calling | ACEBench Normal | Accuracy75.6 | 13 | |
| Function Calling | ACEBench Normal (test) | Summary Score53 | 11 | |
| Tool-use | ACEBench | Accuracy61.8 | 8 | |
| Tool Use | ACEBench-en (out-of-distribution) | Normal Score77.9 | 8 | |
| Multi-turn Dialogue | ACEBench En | MT Accuracy68 | 7 | |
| Agentic Performance | ACEBench-en | End-to-End Accuracy56 | 7 | |
| Agentic Performance | ACEBench-zh | Accuracy89.6 | 5 |