Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Agent Task Completion on τ-BENCH (test)

0.791Average Task Reward

H-EPM

0.326120.446810.56750.68819Dec 8, 2025
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
0.791-
2025.12
0.7390
2025.12
0.717.9
2025.12
0.6722.1
2025.12
0.6710
2025.12
0.66122.2
2025.12
0.6580
2025.12
0.652-0.9
2025.12
0.645-2
2025.12
0.6445.7
2025.12
0.6344.1
2025.12
0.6242.5
2025.12
0.6090
2025.12
0.5756.3
2025.12
0.5725.7
2025.12
0.5674.8
2025.12
0.5410
2025.12
0.49614
2025.12
0.4585.3
2025.12
0.4513.7
2025.12
0.4513.7
2025.12
0.4350
2025.12
0.4219.9
2025.12
0.3983.9
2025.12
0.3830
2025.12
0.376-1.8
2025.12
0.344-10.2