Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Execution on tau-Bench (test)
Loading...
59
Execution Accuracy
Gemini-2.5 Pro
13.24
25.12
37
48.88
Mar 23, 2026
Execution Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Execution Accuracy
Gemini-2.5 Pro
Prompting=2-shot
2026.03
59
Claude-3.5-Sonnet
Prompting=2-shot
2026.03
56
GPT-4o
Prompting=2-shot
2026.03
54
Qwen3-8B Agentic GRPO
Training Strategy=Agen...
2026.03
42
Qwen3-8B SFT
Training Strategy=SFT,...
2026.03
36
Qwen3-8B Base
Prompting=2-shot
2026.03
33
xLAM-2-70B
Training Data=60K-trained
2026.03
17
ToolAce
Training Data=26K-trained
2026.03
15
Feedback
Search any
task
Search any
task