Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-turn Agent Decision Making on tau-Bench (test)
Loading...
55.8
Success Rate
H-EPM
36.56
41.555
46.55
51.545
Dec 8, 2025
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
H-EPM
Backbone=Qwen3-4B-Inst...
2025.12
55.8
H-EPM
Backbone=Qwen3-4B-Inst...
2025.12
54.4
H-EPM
Backbone=Qwen3-4B-Inst...
2025.12
53.4
AgentEvolver
Backbone=Qwen3-4B-Inst...
2025.12
52
GRPO
Backbone=Qwen3-4B-Inst...
2025.12
51
BASE
Backbone=Qwen3-4B-Inst...
2025.12
43.5
SFT
Backbone=Qwen3-4B-Inst...
2025.12
37.3
Feedback
Search any
task
Search any
task