Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Task failure prediction and selective task completion on tau2-bench Retail 1.0

0.707AUROC

TRACER

0.416840.492170.56750.64283Feb 11, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.7070.67
2026.02
0.6890.632
2026.02
0.6730.725
2026.02
0.620.579
2026.02
0.6170.571
2026.02
0.6090.584
2026.02
0.5770.555
2026.02
0.5560.673
2026.02
0.5530.661
2026.02
0.5330.547
2026.02
0.530.544
2026.02
0.5250.684
2026.02
0.4680.497
2026.02
0.440.59
2026.02
0.4280.467