Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Deep sprite-based image models: An analysis

About

While foundation models drive steady progress in image segmentation and diffusion algorithms compose always more realistic images, the seemingly simple problem of identifying recurrent patterns in a collection of images remains very much open. In this paper, we focus on sprite-based image decomposition models, which have shown some promise for clustering and image decomposition and are appealing because of their high interpretability. These models come in different flavors, need to be tailored to specific datasets, and struggle to scale to images with many objects. We dive into the details of their design, identify their core components, and perform an extensive analysis on clustering benchmarks. We leverage this analysis to propose a deep sprite-based image decomposition method that performs on par with state-of-the-art unsupervised class-aware image segmentation methods on the standard CLEVR benchmark, scales linearly with the number of objects, identifies explicitly object categories, and fully models images in an easily interpretable way.

Zeynep Sonat Baltac{\i}, Romain Loiseau, Mathieu Aubry• 2026

Related benchmarks

TaskDatasetResultRank
ClusteringFashion MNIST--
107
ClusteringUSPS
Accuracy0.853
42
ClusteringFRGC
Accuracy Coefficient44.8
18
ClusteringSVHN
Accuracy52.4
18
ClusteringMNIST
Clustering Accuracy96.7
12
Instance SegmentationCLEVR
mIoU53.8
11
Multi-object semantic discoveryCLEVR
mAcc70.6
6
Multi-object semantic discoveryMulti-dSprites
mAcc66
6
Multi-object semantic discoveryCLEVR6
mAcc74.7
6
ClusteringGTSRB-8
Accuracy80.9
5
Showing 10 of 13 rows

Other info

Follow for update