Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multi-turn Agent Decision Making on tau-Bench (test)

55.8Success Rate

H-EPM

36.5641.55546.5551.545Dec 8, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
55.8
2025.12
54.4
2025.12
53.4
2025.12
52
2025.12
51
2025.12
43.5
2025.12
37.3