Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views
About
Learning object-centric representations of multi-object scenes is a promising approach towards machine intelligence, facilitating high-level reasoning and control from visual sensory data. However, current approaches for unsupervised object-centric scene representation are incapable of aggregating information from multiple observations of a scene. As a result, these "single-view" methods form their representations of a 3D scene based only on a single 2D observation (view). Naturally, this leads to several inaccuracies, with these methods falling victim to single-view spatial ambiguities. To address this, we propose The Multi-View and Multi-Object Network (MulMON) -- a method for learning accurate, object-centric representations of multi-object scenes by leveraging multiple views. In order to sidestep the main technical difficulty of the multi-object-multi-view scenario -- maintaining object correspondences across views -- MulMON iteratively updates the latent object representations for a scene over multiple views. To ensure that these iterative updates do indeed aggregate spatial information to form a complete 3D scene understanding, MulMON is asked to predict the appearance of the scene from novel viewpoints during training. Through experiments, we show that MulMON better-resolves spatial ambiguities than single-view methods -- learning more accurate and disentangled object representations -- and also achieves new functionality in predicting object segmentations for novel viewpoints.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Segmentation | CLE-MV | mIoU78.52 | 5 | |
| Disentanglement Analysis | CLE-MV | Disentanglement0.65 | 4 | |
| Novel-viewpoint Observation Prediction | CLE-MV | RMSE (pixel avg.)0.0464 | 4 | |
| Segmentation | CLE-Aug (train) | mIoU71 | 3 | |
| Segmentation | Black-Aug | mIoU68 | 3 | |
| Segmentation | UnseenShape | mIoU0.64 | 3 | |
| Disentanglement | CLE-Aug (train) | D Score0.63 | 2 | |
| Disentanglement | Black-Aug | Disentanglement (D)0.55 | 2 | |
| Disentanglement | UnseenShape | D Score0.5 | 2 | |
| Disentanglement Analysis | CLE-Aug (test) | Disentanglement63 | 2 |