Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Execution on EnterpriseBench (test)
Loading...
55
Execution Accuracy
Claude-3.5-Sonnet
34.2
39.6
45
50.4
Mar 23, 2026
Execution Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Execution Accuracy
Claude-3.5-Sonnet
Prompting=2-shot
2026.03
55
Gemini-2.5 Pro
Prompting=2-shot
2026.03
55
Qwen3-8B Agentic GRPO
Training Strategy=Agen...
2026.03
51
GPT-4o
Prompting=2-shot
2026.03
47
ToolAce
Training Data=26K-trained
2026.03
41
xLAM-2-70B
Training Data=60K-trained
2026.03
40
Qwen3-8B SFT
Training Strategy=SFT,...
2026.03
38
Qwen3-8B Base
Prompting=2-shot
2026.03
35
Feedback
Search any
task
Search any
task