Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

GLASS: Guided Latent Slot Diffusion for Object-Centric Learning

About

Object-centric learning aims to decompose an input image into a set of meaningful object files (slots). These latent object representations enable a variety of downstream tasks. Yet, object-centric learning struggles on real-world datasets, which contain multiple objects of complex textures and shapes in natural everyday scenes. To address this, we introduce Guided Latent Slot Diffusion (GLASS), a novel slot-attention model that learns in the space of generated images and uses semantic and instance guidance modules to learn better slot embeddings for various downstream tasks. Our experiments show that GLASS surpasses state-of-the-art slot-attention methods by a wide margin on tasks such as (zero-shot) object discovery and conditional image generation for real-world scenes. Moreover, GLASS enables the first application of slot attention to the compositional generation of complex, realistic scenes.

Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth• 2024

Related benchmarks

TaskDatasetResultRank
Object-Centric Representation LearningVOC
mBOi58.9
28
Object-Centric Representation LearningCOCO
mBOi40.8
27
Semantic-level object discoveryVOC
mIoU68.9
19
Semantic-level object discoveryCOCO
mIoU46.7
15
Object DiscoveryCOCO
FG-ARI0.341
13
Object DiscoveryVOC
FG-ARI22.5
12
Object DiscoveryCLEVRTex (test)
mIoU (Intersection)47.2
5
Object DiscoveryObj365 (test)
mIoUi19.6
5
Conditional Image GenerationCOCO
PSNR10.93
3
Instance-level property predictionVOC
Accuracy58.1
2
Showing 10 of 11 rows

Other info

Code

Follow for update