Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

When2Call

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool-callingWhen2Call
F1 Score76.8
42
Tool-use gatingWhen2Call
TC Accuracy99.23
30
Multiple Choice ClassificationWhen2Call
Accuracy78.63
24
Temporal ReasoningWhen2Call
Performance Score100
8
Social ReasoningWhen2Call
Accuracy54.5
5
Decision Making ReasoningWhen2Call
Cumulative Score (CS)79
4
Showing 6 of 6 rows