| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Sentiment Analysis | MOSEI (test) | MAE0.523 | 49 | |
| Emotion Recognition | MOSEI | Accuracy (7-Class)51.88 | 26 | |
| Multimodal Sentiment Analysis | MOSEI | F-Score87.09 | 22 | |
| Binary Sentiment Classification | MOSEI (test) | Accuracy82.5 | 16 | |
| Sentiment Analysis | MOSEI (held-out) | F1 Score72.4 | 8 | |
| Sentiment Analysis (SEN) | MOSEI S | Binary Weighted F177.5 | 7 | |
| Emotion Recognition (EMO) | MOSEI E | Mean Weighted Accuracy61.4 | 7 | |
| Streaming Social Task Detection | MOSEI | Accuracy85.64 | 7 | |
| Sentiment Classification | MOSEI (test) | Accuracy (2 Class)85 | 7 | |
| Multi-label Classification | MOSEI | F1 (Happy)72.7 | 5 | |
| Binary Classification | MOSEI | F1 (Happy)71.7 | 5 | |
| Cross-modal retrieval (Video) | MOSEI | R@139.2 | 4 | |
| Cross-modal retrieval (Language) | MOSEI | Recall@134.6 | 4 | |
| Cross-modal retrieval (Audio) | MOSEI | R@134.6 | 4 | |
| Sentiment Analysis | MOSEI | Number of Parameters256,453 | 4 | |
| Audiovisual Sentiment Analysis | MOSEI | Accuracy (AV)67.19 | 3 | |
| Model Selection | MOSEI (unseen) | Performance99.35 | 1 |