Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tracking Shuffled Objects

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logic reasoningTracking Shuffled Objects BBH
Accuracy71.33
59
Tracking Shuffled ObjectsTracking Shuffled Objects 5 objects (test)
Accuracy (TSO 5-obj)94.1
16
Showing 2 of 2 rows