Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Multimodal Deep Learning

About

This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other, as well as models in which one modality is utilized to enhance representation learning for the other. To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced. Finally, we also cover other modalities as well as general-purpose multi-modal models, which are able to handle different tasks on different modalities within one unified architecture. One interesting application (Generative Art) eventually caps off this booklet.

Cem Akkus, Luyang Chu, Vladana Djakovic, Steffen Jauch-Walser, Philipp Koch, Giacomo Loss, Christopher Marquardt, Marco Moldovan, Nadja Sauter, Maximilian Schneider, Rickmer Schulte, Karol Urbanczyk, Jann Goschenhofer, Christian Heumann, Rasmus Hvingelby, Daniel Schalk, Matthias A{\ss}enmacher• 2023

Related benchmarks

TaskDatasetResultRank
Readmission predictionMIMIC IV
AUC-ROC0.6817
19
Grasp DetectionCornell Grasping Dataset (Image-wise split)--
17
Coarse Sentiment ClassificationHotel Review dataset
Coarse Acc81.49
12
Fine Sentiment ClassificationHotel Review dataset
F-Score Accuracy67.32
12
Sentiment RegressionHotel Review dataset
MAE0.0714
12
Mortality PredictioneICU
AUC-ROC0.8624
9
Readmission predictioneICU
AUC-ROC0.7462
9
Grasp DetectionCornell Grasping Dataset (Object-wise split)
Point Grasp Success Rate70.7
8
Multi-modal ReconstructionRoboMNIST bimodal real-world (train)
Sensor Modality Loss26.945
3
Multi-modal ReconstructionRoboMNIST bimodal real-world (test)
Sensor Modality Loss33.563
3
Showing 10 of 10 rows

Other info

Follow for update