Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Failure attribution on τ-bench
Loading...
75.9
Agent Accuracy
Our Baseline
61.444
65.197
68.95
72.703
Feb 2, 2026
Agent Accuracy
Step Accuracy
Updated 3mo ago
Evaluation Results
Method
Method
Links
Agent Accuracy
Step Accuracy
Our Baseline
Backbone LLM=GPT-5
2026.02
75.9
32.2
Who&When*
Backbone LLM=GPT-5, Pr...
2026.02
62
17.2
Feedback
Search any
task
Search any
task