Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Task Completion on τ-bench-retail
Loading...
70.2
Success Rate
SkillMAS
61.048
63.424
65.8
68.176
May 10, 2026
Success Rate
Pass@1 Success Rate
Pass@2 Success Rate
Pass@3 Success Rate
Pass@4 Success Rate
Pass@5 Success Rate
Updated 22d ago
Evaluation Results
Method
Method
Links
Success Rate
Pass@1 Success Rate
Pass@2 Success Rate
Pass@3 Success Rate
Pass@4 Success Rate
Pass@5 Success Rate
SkillMAS
Evaluation Source=Inte...
2026.05
70.2
-
-
-
-
-
CDMem
Evaluation Source=Inte...
2026.05
68.4
-
-
-
-
-
Traj-Bootstrap
Evaluation Source=Inte...
2026.05
68.4
-
-
-
-
-
ReAct
Evaluation Source=Inte...
2026.05
62.3
-
-
-
-
-
Direct LLM
Evaluation Source=Inte...
2026.05
61.4
-
-
-
-
-
ReAct
Backbone=Qwen3-14B
2026.04
-
25.2
17.8
14.7
13.2
12.1
FAMA
Backbone=Qwen3-14B
2026.04
-
37.9
25.7
19.7
16.3
14.7
SR
Backbone=Qwen3-14B
2026.04
-
32.1
19.3
14.69
12.52
11.3
Feedback
Search any
task
Search any
task