Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Who & When

Benchmarks

Task NameDataset NameSOTA ResultTrend
Error AttributionWho&When
Pair µF18.1
30
Failure AttributionWho&When
Agent Accuracy60.79
22
Trajectory AttributionWho&When n=58 (Hand-Crafted)
Agent-level Accuracy73
15
Trajectory AttributionWho&When Algorithm-Generated n=126
Agent-level Accuracy68
15
Failure AttributionWho&When Total
Step-level Accuracy36.22
13
Failure AttributionWho&When Hand-Crafted
Step-level Accuracy41.38
13
Failure AttributionWho&When Algorithm-Generated
Step-level Accuracy42.86
13
Online auditingWho&When
Step Accuracy57.69
8
Error ForecastingWho&When
Eta (%)100
6
Failure attributionWho & When Boundary
Agent Attribution Accuracy38.71
6
Failure attributionWho & When Remove ID
Agent Attribution Accuracy26.47
6
Failure attributionWho & When Baseline
Agent Attribution Accuracy54.33
6
Showing 12 of 12 rows