| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Classification | AV-MNIST | Accuracy72.38 | 24 | |
| Multi-modal classification | AV-MNIST (val) | Accuracy (Audio)42.32 | 10 | |
| Multimodal Classification | AV-MNIST 100% conflict (test) | Recall99.61 | 8 | |
| Multimodal Classification | Conflict-AV-MNIST 50% (test) | Recall99.45 | 8 | |
| Multimodal Classification | Conflict-AV-MNIST 0% (test) | Recall99.22 | 8 | |
| Digit Classification | AV-MNIST Vision modality standard (test) | Accuracy71.32 | 4 | |
| Clustering | AV-MNIST 3 modal | Gap Statistic0.24 | 3 | |
| Cross-modal Retrieval | AV-MNIST 3 modal | Gap0.09 | 3 |