Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

RoTBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool UseRoTBench Multi-turn
Tool Selection Accuracy72.9
35
Tool UseRoTBench Single-turn
Tool Selection84.8
35
Robustness of Tool-useRoTBench
TS78.76
27
Showing 3 of 3 rows