| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Semantic Representation Evaluation | ARCH (test) | RAVDESS48.96 | 13 | |
| Semantic Representation Classification | ARCH Reconstruction Domain | RAVDESS Accuracy81.25 | 10 | |
| Speech Classification | ARCH | RAVDESS Score37.5 | 8 | |
| Image-to-Text Cross-modal Retrieval | ARCH (test) | R@11,256 | 8 | |
| Text-to-Image Cross-modal Retrieval | ARCH (test) | R@111.17 | 8 | |
| Image-to-Text Retrieval | ARCH | R@19.97 | 7 | |
| Text-to-Image Retrieval | ARCH | R@18.89 | 7 |