Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool-use agent evaluation on τ-bench retail
Loading...
44.173
Pass@1 Success Rate
FAMA
26.00108
30.71879
35.4365
40.15421
Apr 28, 2026
Pass@1 Success Rate
Pass@2 Success Rate
Pass@3 Success Rate
Pass@4 Success Rate
Pass@5 Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1 Success Rate
Pass@2 Success Rate
Pass@3 Success Rate
Pass@4 Success Rate
Pass@5 Success Rate
FAMA
Backbone=Qwen2.5-72B-I...
2026.04
44.173
34.26
30.26
28.17
26.95
ReAct
Backbone=Qwen2.5-72B-I...
2026.04
43.47
32.26
26.69
23.13
20.86
SR
Backbone=Qwen2.5-72B-I...
2026.04
42.9
30.7
25
21.2
18.2
FAMA
Backbone=Qwen3-32B
2026.04
40.5
26.9
19.9
15.3
12.2
ReAct
Backbone=Qwen3-32B
2026.04
38
25
18.8
15.3
10
SR
Backbone=Qwen3-32B
2026.04
26.7
15.6
11.3
9.2
7.8
Feedback
Search any
task
Search any
task