Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multimodal Generative Models for Scalable Weakly-Supervised Learning

About

Multiple modalities often co-occur when describing natural phenomena. Learning a joint representation of these modalities should yield deeper and more useful representations. Previous generative approaches to multi-modal input either do not learn a joint distribution or require additional computation to handle missing data. Here, we introduce a multimodal variational autoencoder (MVAE) that uses a product-of-experts inference network and a sub-sampled training paradigm to solve the multi-modal inference problem. Notably, our model shares parameters to efficiently learn under any combination of missing modalities. We apply the MVAE on four datasets and match state-of-the-art performance using many fewer parameters. In addition, we show that the MVAE is directly applicable to weakly-supervised learning, and is robust to incomplete supervision. We then consider two case studies, one of learning image transformations---edge detection, colorization, segmentation---as a set of modalities, followed by one of machine translation between two languages. We find appealing results across this range of tasks.

Mike Wu, Noah Goodman• 2018

Related benchmarks

TaskDatasetResultRank
ClassificationYaleB (test)
Accuracy100
48
Behavior DecodingNHP center-out reaching (test)
CC Accuracy0.544
15
Behavior DecodingNHP grid reaching (test)
Accuracy (CC)42.5
15
PAWP PredictionASPIRE registry
AUROC0.758
10
Joint ClusteringCUB Image-Captions for Clustering (CUBICC) (test)
ACC38.7
10
Caption-only ClusteringCUB Image-Captions for Clustering (CUBICC) (test)
ACC18.1
10
Image-only ClusteringCUB Image-Captions for Clustering (CUBICC) (test)
Accuracy26.2
10
Multi-modal Image Synthesis (iUS + T2 inputs)Brain Glioma Patients
T2 PSNR (dB)21.7
8
Image Synthesis (T2 to iUS)Brain Glioma Patients
iUS PSNR21.21
6
Showing 9 of 9 rows

Other info

Follow for update