Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Failure attribution on τ-bench

75.9Agent Accuracy

Our Baseline

61.44465.19768.9572.703Feb 2, 2026
Updated 3mo ago

Evaluation Results

MethodLinks
2026.02
75.932.2
2026.02
6217.2