Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Agent Performance on ACEBench-en

56End-to-End Accuracy

GPT-4o-2024-11-20

-0.47214.18928.8543.511Aug 18, 2025
Updated 4d ago

Evaluation Results

MethodLinks
56-77.8
41-62.5
2025.08
8.4-34
6.7-18.3
6.7-15
2025.08
1.7-28.5
2025.08
1.7-22.8
-81.4-
-80.9-
-79.3-
-80.9-
2026.02
-87.7-