Multi-Level Variational Autoencoder: Learning Disentangled Representations from Grouped Observations
About
We would like to learn a representation of the data which decomposes an observation into factors of variation which we can independently control. Specifically, we want to use minimal supervision to learn a latent representation that reflects the semantics behind a specific grouping of the data, where within a group the samples share a common factor of variation. For example, consider a collection of face images grouped by identity. We wish to anchor the semantics of the grouping into a relevant and disentangled representation that we can easily exploit. However, existing deep probabilistic models often assume that the observations are independent and identically distributed. We present the Multi-Level Variational Autoencoder (ML-VAE), a new deep probabilistic model for learning a disentangled representation of a set of grouped observations. The ML-VAE separates the latent representation into semantically meaningful parts by working both at the group level and the observation level, while retaining efficient test-time inference. Quantitative and qualitative evaluations show that the ML-VAE model (i) learns a semantically meaningful disentanglement of grouped data, (ii) enables manipulation of the latent representation, and (iii) generalises to unseen groups.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| FoV regression | Cars3D (all) | R2 Score0.989 | 55 | |
| Disentangled Representation Learning | Cars3D | FactorVAE0.87 | 35 | |
| Disentanglement | Shapes3D | -- | 18 | |
| Abstract Visual Reasoning | Abstract Visual Reasoning WReN (10^2 samples) | Accuracy17.7 | 15 | |
| Disentanglement | MPI3D | BetaVAE Score0.703 | 13 | |
| Disentanglement | Shapes3D | BetaVAE Score0.976 | 13 | |
| Pose Estimation | Pascal3D+ chair (test) | Median Angular Error (°)80.6 | 12 | |
| Viewpoint Estimation | Pascal3D+ Car (test) | Median Error75.6 | 12 | |
| Pose Estimation | Synthetic domain cars (unseen instances) | Med. Error (°)9.3 | 4 |