Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tools

Benchmarks

Task NameDataset NameSOTA ResultTrend
Sequential RecommendationTools (test)
HR@104.91
12
Generative RecommendationTools (Period 4)
H@52.46
8
Generative RecommendationTools (Period 3)
Hit Rate @ 52.18
8
Generative RecommendationTools (Period 2)
H@51.81
8
Generative RecommendationTools (Period 1)
Hit Rate @ 52.26
8
Task PlanningTools PCD distribution (test)
Success Rate100
8
Task PlanningTools PCD (train)
Success Rate100
8
RecommendationTools TIGER Backbone (Period 4)
H@52.46
7
RecommendationTools TIGER Backbone (Period 2)
H@52.33
7
RecommendationTools TIGER Backbone (Period 1)
H@52.39
7
Tool Use AccuracySeen Tools
SRt100
7
Human EvaluationTools 100 pairs
Win Rate88
1
Showing 12 of 12 rows