Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Performance on Tau-bench Retail
Loading...
60.31
Avg@4
NoisyAgent
34.31
41.06
47.81
54.56
May 26, 2026
Avg@4
Pass@4
Updated 7d ago
Evaluation Results
Method
Method
Links
Avg@4
Pass@4
NoisyAgent
backbone=Qwen3-32B
2026.05
60.31
86.84
GSPO
backbone=Qwen3-32B
2026.05
58.55
84.21
GRPO
backbone=Qwen3-32B
2026.05
58.11
83.33
DAPO
backbone=Qwen3-32B
2026.05
56.58
80.7
Qwen3-32B
backbone=Qwen3-32B
2026.05
49.12
72.81
NoisyAgent
backbone=Qwen3-8B
2026.05
47.59
77.19
GSPO
backbone=Qwen3-8B
2026.05
46.49
74.56
GRPO
backbone=Qwen3-8B
2026.05
46.05
73.68
DAPO
backbone=Qwen3-8B
2026.05
44.52
71.05
Qwen3-8B
backbone=Qwen3-8B
2026.05
35.31
59.65
Feedback
Search any
task
Search any
task