Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers

About

Unsupervised object-centric learning aims to decompose scenes into interpretable object entities, termed slots. Slot-based auto-encoders stand out as a prominent method for this task. Within them, crucial aspects include guiding the encoder to generate object-specific slots and ensuring the decoder utilizes them during reconstruction. This work introduces two novel techniques, (i) an attention-based self-training approach, which distills superior slot-based attention masks from the decoder to the encoder, enhancing object segmentation, and (ii) an innovative patch-order permutation strategy for autoregressive transformers that strengthens the role of slot vectors in reconstruction. The effectiveness of these strategies is showcased experimentally. The combined approach significantly surpasses prior slot-based autoencoder methods in unsupervised object segmentation, especially with complex real-world images. We provide the implementation code at https://github.com/gkakogeorgiou/spot .

Ioannis Kakogeorgiou, Spyros Gidaris, Konstantinos Karantzalos, Nikos Komodakis• 2023

Related benchmarks

TaskDatasetResultRank
Object Hallucination EvaluationPOPE
Accuracy79.69
935
Visual Question AnsweringGQA
Accuracy54.94
374
Multimodal EvaluationMM-Vet
Accuracy17.8
122
Counterfactual reasoningCVQA
Accuracy69.47
40
Multi-modal Perception EvaluationMME Perception
Perception Score1.17e+3
31
Unsupervised Object SegmentationCOCO
mBOi35
26
OOD GeneralizationOODCV
Accuracy54.07
20
Vision-Language CompositionalitySugarCrepe
Accuracy74.08
20
Robustness to Natural Adversarial ExamplesNaturalBench
Accuracy3.68
20
Semantic-level object discoveryVOC
mIoU55.3
19
Showing 10 of 29 rows

Other info

Code

Follow for update