Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool-use agent evaluation on τ-bench airline (test)
Loading...
37.6
Pass@1 Success Rate
FAMA
31.776
33.288
34.8
36.312
Apr 28, 2026
Pass@1 Success Rate
Pass@2 Success Rate
Pass@3 Success Rate
Pass@4 Success Rate
Pass@5 Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pass@1 Success Rate
Pass@2 Success Rate
Pass@3 Success Rate
Pass@4 Success Rate
Pass@5 Success Rate
FAMA
Backbone=Qwen3-4B-Inst...
2026.04
37.6
32
28.3
26.7
26
SR
Backbone=Qwen3-4B-Inst...
2026.04
33.2
23.9
19.6
16.39
14
ReAct
Backbone=Qwen3-4B-Inst...
2026.04
32
28
26.8
26.4
26
Feedback
Search any
task
Search any
task