Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

When2Call

Benchmarks

Task NameDataset NameSOTA ResultTrend
Temporal ReasoningWhen2Call
Performance Score100
8
Social ReasoningWhen2Call
Accuracy54.5
5
Decision Making ReasoningWhen2Call
Cumulative Score (CS)79
4
Showing 3 of 3 rows