Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent Execution on tau-Bench (test)

59Execution Accuracy

Gemini-2.5 Pro

13.2425.123748.88Mar 23, 2026
Updated 25d ago

Evaluation Results

MethodLinks
2026.03
59
2026.03
56
2026.03
54
2026.03
42
2026.03
36
2026.03
33
2026.03
17
2026.03
15