| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Sentiment Analysis | MOSEI | MAE0.486 | 183 | |
| Multimodal Sentiment Analysis | MOSEI (test) | MAE0.523 | 49 | |
| Emotion Recognition | MOSEI | Accuracy (7-Class)51.88 | 26 | |
| Binary Sentiment Classification | MOSEI (test) | Accuracy82.5 | 16 | |
| Multimodal Sentiment Analysis | MOSEI 2018b (test) | Acc2 (pos/neg)85.88 | 13 | |
| Active Modality Acquisition | MOSEI Text imputed by Image and Audio (test) | AUROC1.207 | 11 | |
| Active Modality Acquisition | MOSEI Image imputed by Text | AUROC (Gfull)99.5 | 11 | |
| Active Modality Acquisition | MOSEI Text imputed by Image | AUROC (G_full)1.613 | 11 | |
| Active Modality Acquisition | MOSEI Audio imputed by Image and Text (test) | AUROC (G_full)1.215 | 11 | |
| Active Modality Acquisition | MOSEI | AUROC (G_full)1.478 | 11 | |
| Active Modality Acquisition | MOSEI Audio imputed by Image | AUROC1.238 | 11 | |
| Active Modality Acquisition (Image imputed by Audio) | MOSEI | AUROC1.052 | 11 | |
| Active Modality Acquisition | MOSEI Image imputed by Text and Audio (test) | AUROC (G_full)1.321 | 11 | |
| Active Modality Acquisition | MOSEI Audio imputed by Text (test) | AUROC4.955 | 11 | |
| Active Modality Acquisition (Text imputed by Audio) | MOSEI | AUROC8.867 | 11 | |
| Sentiment Analysis | MOSEI (held-out) | F1 Score72.4 | 8 | |
| Sentiment Analysis (SEN) | MOSEI S | Binary Weighted F177.5 | 7 | |
| Emotion Recognition (EMO) | MOSEI E | Mean Weighted Accuracy61.4 | 7 | |
| Streaming Social Task Detection | MOSEI | Accuracy85.64 | 7 | |
| Sentiment Classification | MOSEI (test) | Accuracy (2 Class)85 | 7 | |
| Classification | MOSEI | Accuracy77 | 6 | |
| Multi-label Classification | MOSEI | F1 (Happy)72.7 | 5 | |
| Binary Classification | MOSEI | F1 (Happy)71.7 | 5 | |
| Cross-modal retrieval (Video) | MOSEI | R@139.2 | 4 | |
| Cross-modal retrieval (Language) | MOSEI | Recall@134.6 | 4 |