Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tool Use Reasoning on τ-Bench

63.9Avg Accuracy

o1

27.60437.02746.4555.873Feb 2, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.02
63.973.554.2
2026.02
59.878.341.2
2026.02
52.962.843
2026.02
51.356.546
2026.02
48.357.738.8
2026.02
47.855.640
2026.02
47.650.744.4
2026.02
4552.537.6
2026.02
42.450.734
2026.02
37.849.526
2026.02
36.639.633.6
2026.02
33.641.625.6
2026.02
31.540.622.4
2026.02
2934.723.2