Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Sorting

Benchmarks

Task NameDataset NameSOTA ResultTrend
SortingSorting
Accuracy99.6
7
Failure / predicate detectionSORTING
F1 Score59.3
4
SORTINGSORTING w/o human
Success Rate93
2
SORTINGSORTING w/ human
Success Rate80
2
Showing 4 of 4 rows