Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Tool Use Reasoning on τ-Bench

63.9Avg Accuracy

o1

27.60437.02746.4555.873Feb 2, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
63.973.554.2
2026.02
59.878.341.2
2026.02
52.962.843
2026.02
51.356.546
2026.02
48.357.738.8
2026.02
47.855.640
2026.02
47.650.744.4
2026.02
4552.537.6
2026.02
42.450.734
2026.02
37.849.526
2026.02
36.639.633.6
2026.02
33.641.625.6
2026.02
31.540.622.4
2026.02
2934.723.2