Multi-Object Representation Learning with Iterative Variational Inference

About

Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and represent objects jointly. We demonstrate that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations. Our method learns -- without supervision -- to inpaint occluded parts, and extrapolates to scenes with more objects and to unseen objects with novel feature combinations. We also show that, due to the use of iterative variational inference, our system is able to learn multi-modal posteriors for ambiguous inputs and extends naturally to sequences.

Klaus Greff, Rapha\"el Lopez Kaufman, Rishabh Kabra, Nick Watters, Chris Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, Alexander Lerchner• 2019

Related benchmarks

Task	Dataset	Result
Unsupervised Object Segmentation	CLEVRTEX 1.0 (test)	FG-ARI60.63	20
Unsupervised Object Segmentation	CLEVR 1.0 (test)	FG-ARI93.81	16
Unsupervised Object Segmentation	OOD 1.0 (test)	FG-ARI5.49e+3	16
Unsupervised Object Segmentation	CAMO 1.0 (test)	FG-ARI38.29	16
Instance Segmentation	CLEVR	mIoU45.1	11
Unsupervised Multi-object Segmentation	KITTI	FG-ARI14.4	9
Object Discovery	CLEVRTEX (val)	mIoU29.2	6
Latent-space disentanglement and controllability	CLEVR	Disentanglement0.784	6
Object Discovery	KITTI (val)	Fg. ARI14.4	6
Object Discovery	CATER original (test)	Fg. ARI73.5	6

Showing 10 of 24 rows

Other info

Follow for update

@wizwand_team Discord