Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Unlocking Slot Attention by Changing Optimal Transport Costs

About

Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.

Yan Zhang, David W. Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek• 2023

Related benchmarks

TaskDatasetResultRank
ClassificationPokerRules standard (test)
Task Accuracy99.93
6
Image ClassificationMM-A in-distribution (test)
Accuracy98.86
6
Image ClassificationMM-A out-of-distribution (OOD)
Task Accuracy18.26
6
ClassificationPokerRules Extrapolation: 5 cards (In-distribution class)
Task Accuracy37.8
5
Image ClassificationMM-A Extrapolation 4 digits
Task Accuracy37.5
5
Image ClassificationMM-A Extrapolation 5 digits
Task Accuracy12
5
AdditionCLEVR-Addition (test)
Task Accuracy96.97
3
AdditionCLEVR-Addition 7 objects (extrapolation)
Task Accuracy0.5
3
Showing 8 of 8 rows

Other info

Follow for update