Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool-use agent evaluation on τ-bench retail (test)
Loading...
34.6
Pass@1 Success Rate
FAMA
16.5248
21.2174
25.91
30.6026
Apr 28, 2026
Pass@1 Success Rate
Pass@2 Success Rate
Pass@3 Success Rate
Pass@4 Success Rate
Pass@5 Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1 Success Rate
Pass@2 Success Rate
Pass@3 Success Rate
Pass@4 Success Rate
Pass@5 Success Rate
FAMA
Backbone=Qwen3-4B-Inst...
2026.04
34.6
24.1
19.3
16.3
13.9
SR
Backbone=Qwen3-4B-Inst...
2026.04
31.3
19.3
14.34
11.82
10.43
ReAct
Backbone=Qwen3-4B-Inst...
2026.04
17.22
12.35
10.61
9.57
8.7
Feedback
Search any
task
Search any
task