Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Execution on EnterpriseArena (test)
Loading...
71
Execution Accuracy
Gemini-2.5 Pro
12.76
27.88
43
58.12
Mar 23, 2026
Execution Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Execution Accuracy
Gemini-2.5 Pro
Prompting=2-shot
2026.03
71
Claude-3.5-Sonnet
Prompting=2-shot
2026.03
60
GPT-4o
Prompting=2-shot
2026.03
45
Qwen3-8B Agentic GRPO
Training Strategy=Agen...
2026.03
43
ToolAce
Training Data=26K-trained
2026.03
39
Qwen3-8B SFT
Training Strategy=SFT,...
2026.03
35
Qwen3-8B Base
Prompting=2-shot
2026.03
31
xLAM-2-70B
Training Data=60K-trained
2026.03
15
Feedback
Search any
task
Search any
task