Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Execution on CRMArena (test)
Loading...
49
Execution Accuracy
Gemini-2.5 Pro
8.44
18.97
29.5
40.03
Mar 23, 2026
Execution Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Execution Accuracy
Gemini-2.5 Pro
Prompting=2-shot
2026.03
49
Qwen3-8B Agentic GRPO
Training Strategy=Agen...
2026.03
35
Claude-3.5-Sonnet
Prompting=2-shot
2026.03
34
GPT-4o
Prompting=2-shot
2026.03
32
Qwen3-8B SFT
Training Strategy=SFT,...
2026.03
30
Qwen3-8B Base
Prompting=2-shot
2026.03
25
xLAM-2-70B
Training Data=60K-trained
2026.03
12
ToolAce
Training Data=26K-trained
2026.03
10
Feedback
Search any
task
Search any
task