Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Object-centric Learning with Cyclic Walks between Parts and Whole

About

Learning object-centric representations from complex natural environments enables both humans and machines with reasoning abilities from low-level perceptual features. To capture compositional entities of the scene, we proposed cyclic walks between perceptual features extracted from vision transformers and object entities. First, a slot-attention module interfaces with these perceptual features and produces a finite set of slot representations. These slots can bind to any object entities in the scene via inter-slot competitions for attention. Next, we establish entity-feature correspondence with cyclic walks along high transition probability based on the pairwise similarity between perceptual features (aka "parts") and slot-binded object representations (aka "whole"). The whole is greater than its parts and the parts constitute the whole. The part-whole interactions form cycle consistencies, as supervisory signals, to train the slot-attention module. Our rigorous experiments on \textit{seven} image datasets in \textit{three} \textit{unsupervised} tasks demonstrate that the networks trained with our cyclic walks can disentangle foregrounds and backgrounds, discover objects, and segment semantic objects in complex scenes. In contrast to object-centric models attached with a decoder for the pixel-level or feature-level reconstructions, our cyclic walks provide strong learning signals, avoiding computation overheads and enhancing memory efficiency. Our source code and data are available at: \href{https://github.com/ZhangLab-DeepNeuroCogLab/Parts-Whole-Object-Centric-Learning/}{link}.

Ziyu Wang, Mike Zheng Shou, Mengmi Zhang• 2023

Related benchmarks

TaskDatasetResultRank
Semantic segmentationPASCAL VOC 2012
mIoU43.3
187
Semantic segmentationCOCO-Stuff 27
mIoU22.5
40
Object DiscoveryMOVi-C (val)
fg-ARI67.6
7
Object DiscoveryCOCO 2017 (val)
FG-ARI39.7
6
Unsupervised Foreground ExtractionCUB200 Birds (test)
mIoU72.4
5
Unsupervised Foreground ExtractionStanford Dogs (test)
mIoU86.2
5
Unsupervised Foreground ExtractionStanford Cars (test)
mIoU0.902
5
Unsupervised Foreground ExtractionFlowers (test)
mIoU75.1
5
Unsupervised Object DiscoveryCLEVRTex (test)
FG-ARI67.4
5
Object DiscoveryPASCAL VOC 2012 (val)
FG-ARI29.6
3
Showing 10 of 11 rows

Other info

Code

Follow for update