Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RestBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool LearningRestBench TMDB
Success Rate86.2
32
Task PlanningRestBench TMDB
Node F182.63
25
Sequential Tool UseRestBench Spotify
Success Rate86.1
22
Tool PlanningRestBench Spotify
Pass Rate61.25
12
Tool PlanningRestBench TMDB
Pass Rate72.4
12
Tool LearningRestBench Spotify
Success87.72
10
Task PlanningRestBench TMDB v1 (test)
n-F182.56
4
Tool selection and execution successRestBench Spotify
Metric-
0
Tool selection and execution successRestBench TMDB
Metric-
0
Showing 9 of 9 rows