Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation

About

We present SlotAdapt, an object-centric learning method that combines slot attention with pretrained diffusion models by introducing adapters for slot-based conditioning. Our method preserves the generative power of pretrained diffusion models, while avoiding their text-centric conditioning bias. We also incorporate an additional guidance loss into our architecture to align cross-attention from adapter layers with slot attention. This enhances the alignment of our model with the objects in the input image without using external supervision. Experimental results show that our method outperforms state-of-the-art techniques in object discovery and image generation tasks across multiple datasets, including those with real images. Furthermore, we demonstrate through experiments that our method performs remarkably well on complex real-world images for compositional generation, in contrast to other slot-based generative methods in the literature. The project page can be found at https://kaanakan.github.io/SlotAdapt/.

Adil Kaan Akan, Yucel Yemez• 2025

Related benchmarks

TaskDatasetResultRank
Unsupervised Object SegmentationCOCO
mBOi35.1
26
Unsupervised Object SegmentationMOVi-E (test)
mBO43.38
11
Unsupervised Object SegmentationVOC
FG-ARI29.6
9
Downstream Property PredictionMOVi-C
Position Error1.14
7
Property PredictionMOVi-E
Position MSE1.77
5
Compositional generationCOCO
KID0.0344
4
ReconstructionCOCO
KID (x10³)3.90e-4
4
Unsupervised Object SegmentationMOVi-C (test)
mBO45.57
4
Showing 8 of 8 rows

Other info

Follow for update