Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agent Performance on ACEBench-en
Loading...
56
End-to-End Accuracy
GPT-4o-2024-11-20
-0.472
14.189
28.85
43.511
Aug 18, 2025
End-to-End Accuracy
Accuracy
Process Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
End-to-End Accuracy
Accuracy
Process Accuracy
GPT-4o-2024-11-20
2025.08
56
-
77.8
Llama3.1-70B-Inst
2025.08
41
-
62.5
ToolACE-MT
2025.08
8.4
-
34
Llama3.1-8B-Inst
2025.08
6.7
-
18.3
Multi-Agent Simulation
2025.08
6.7
-
15
ToolACE-MT
Ablation=Without Offli...
2025.08
1.7
-
28.5
ToolACE-MT
Ablation=Without Itera...
2025.08
1.7
-
22.8
DS V3.2-Thinking
2026.02
-
81.4
-
Gemini 2.5-Pro
2026.02
-
80.9
-
GPT-5 (High)
2026.02
-
79.3
-
Gemini 3-Pro
2026.02
-
80.9
-
ERNIE 5.0
2026.02
-
87.7
-
Feedback
Search any
task
Search any
task