Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-turn agent decision making on tau2-Bench (test)
Loading...
22.3
Success Rate
H-EPM
15.54
17.295
19.05
20.805
Dec 8, 2025
Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Success Rate
H-EPM
Backbone=Qwen3-4B-Inst...
2025.12
22.3
H-EPM
Backbone=Qwen3-4B-Inst...
2025.12
21.8
H-EPM
Backbone=Qwen3-4B-Inst...
2025.12
21.3
GRPO
Backbone=Qwen3-4B-Inst...
2025.12
17.8
AgentEvolver
Backbone=Qwen3-4B-Inst...
2025.12
16.7
SFT
Backbone=Qwen3-4B-Inst...
2025.12
16.3
BASE
Backbone=Qwen3-4B-Inst...
2025.12
15.8
Feedback
Search any
task
Search any
task