Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Task failure prediction and selective task completion on tau2-bench Telecom 1.0

0.809AUROC

TRACER

0.407560.511780.6160.72022Feb 11, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
0.8090.52
2026.02
0.7650.613
2026.02
0.6910.517
2026.02
0.6860.394
2026.02
0.6830.389
2026.02
0.6730.446
2026.02
0.6510.395
2026.02
0.6490.392
2026.02
0.6390.379
2026.02
0.6210.392
2026.02
0.5590.358
2026.02
0.5480.27
2026.02
0.5340.347
2026.02
0.5070.283
2026.02
0.4230.257