Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

All Tasks

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Language UnderstandingAll tasks (25 tasks) (val)
Overall Accuracy85.93
13
RetrievalAll tasks Pooled
nDCG@1035.5
4
Tool SelectionAll Tasks
Tools Correct82
4
Showing 3 of 3 rows