Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Generalized Multimodal ELBO

About

Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. However, existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models: their posterior approximation functions lead to a trade-off between the semantic coherence and the ability to learn the joint data distribution. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations. The new objective encompasses two previous methods as special cases and combines their benefits without compromises. In extensive experiments, we demonstrate the advantage of the proposed method compared to state-of-the-art models in self-supervised, generative learning tasks.

Thomas M. Sutter, Imant Daunhawer, Julia E. Vogt• 2021

Related benchmarks

TaskDatasetResultRank
Multimodal ClassificationMST Missing Modalities
Accuracy99.62
28
Multimodal ClassificationPolyMNIST Missing Rate η=0.6
Accuracy96.81
16
Multimodal ClassificationPolyMNIST Missing Rate η=0.8
Accuracy87.06
16
Multimodal ClassificationPolyMNIST Missing Rate η=0
Accuracy99.79
16
Multimodal ClassificationMST Missing Modalities {S,T}
Accuracy0.965
14
Multimodal ClassificationCelebA Missing Modalities {I}
Accuracy56.9
14
Multimodal ClassificationCelebA Missing Modalities {T}
Accuracy65.75
14
Multimodal ClassificationCelebA Missing Modalities
Accuracy68.22
14
Caption-only ClusteringCUB Image-Captions for Clustering (CUBICC) (test)
ACC43.5
10
Image-only ClusteringCUB Image-Captions for Clustering (CUBICC) (test)
Accuracy33.4
10
Showing 10 of 11 rows

Other info

Follow for update