Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

QASA: Quality-Guided K-Adaptive Slot Attention for Unsupervised Object-Centric Learning

About

Slot Attention, an approach that binds different objects in a scene to a set of "slots", has become a leading method in unsupervised object-centric learning. Most methods assume a fixed slot count K, and to better accommodate the dynamic nature of object cardinality, a few works have explored K-adaptive variants. However, existing K-adaptive methods still suffer from two limitations. First, they do not explicitly constrain slot-binding quality, so low-quality slots lead to ambiguous feature attribution. Second, adding a slot-count penalty to the reconstruction objective creates conflicting optimization goals between reducing the number of active slots and maintaining reconstruction fidelity. As a result, they still lag significantly behind strong K-fixed baselines. To address these challenges, we propose Quality-Guided K-Adaptive Slot Attention (QASA). First, we decouple slot selection from reconstruction, eliminating the mutual constraints between the two objectives. Then, we propose an unsupervised Slot-Quality metric to assess per-slot quality, providing a principled signal for fine-grained slot--object binding. Based on this metric, we design a Quality-Guided Slot Selection scheme that dynamically selects a subset of high-quality slots and feeds them into our newly designed gated decoder for reconstruction during training. At inference, token-wise competition on slot attention yields a K-adaptive outcome. Experiments show that QASA substantially outperforms existing K-adaptive methods on both real and synthetic datasets. Moreover, on real-world datasets QASA surpasses K-fixed methods.

Tianran Ouyang, Xingping Dong, Jing Zhang, Mang Ye, Jun Chen, Bo Du• 2026

Related benchmarks

TaskDatasetResultRank
Unsupervised Object SegmentationCOCO
mBOi36.7
26
Object-Centric LearningPascal
MBO^i49.7
18
Object-Centric LearningMOVi-C
MBO^i46.9
17
Object-Centric LearningMOVi-E
MBO^i39.1
13
Object DiscoveryCOCO--
13
Object DiscoveryMOVi-C
mBOi46.9
6
Object DiscoveryMOVi-E
mBOi39.1
4
Object DiscoveryPascal VOC
mBOi49.7
3
Showing 8 of 8 rows

Other info

Follow for update