| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multimodal Retrieval | MMEB | Classification Score788.1 | 50 | |
| Image Embedding | MMEB v1 (test) | Classification67.6 | 23 | |
| Multimodal Ranking | MMEB | Classification Score70 | 22 | |
| Multi-modal Embedding | MMEB 1.0 (test) | Classification Accuracy65.6 | 18 | |
| Multi-modal Representation Learning | MMEB OOD 1.0 | OOD Precision@159.1 | 18 | |
| Multi-modal Representation Learning | MMEB In-Distribution 1.0 | MMEB IND Precision@171.6 | 18 | |
| Multi-modal Representation Learning | MMEB Overall 1.0 | Classification P@161.6 | 18 | |
| Multimodal Embedding Evaluation | MMEB Overall | Classification Score72.6 | 18 | |
| Retrieval | MMEB v2 | Image Retrieval Score78.2 | 18 | |
| Multimodal Embedding Evaluation | MMEB V2 (test) | Image CLS Hit@167.1 | 14 | |
| Multimodal Retrieval and Understanding | MMEB V2 (test) | Image CLS Acc76.7 | 14 | |
| Zero-shot Image Classification | MMEB (val) | Image Classification Accuracy66.8 | 9 | |
| Multimodal Visual Document Retrieval | MMEB Visual Document portion v2 | ViDoRe ArXivQA Score88.7 | 9 | |
| Multimodal Video Retrieval | MMEB Video portion v2 | K700 Score56.8 | 9 | |
| Video Retrieval | MMEB Video Retrieval (MSRVTT, MSVD, DiDeMo, YouCook2, VATEX) v2 (test) | Retrieval Score43.1 | 8 | |
| Video Classification | MMEB Video Classification (Kinetics-700, SSv2, HMDB, UCF, Breakfast) v2 (test) | Classification Accuracy63.7 | 8 | |
| Video Question Answering | MMEB Video QA v2 (test) | Average Score72.5 | 6 | |
| Video Understanding | MMEB Video v2 | Overall Score59.9 | 5 | |
| Video Action Recognition | MMEB Video zero-shot | Overall Accuracy63 | 2 | |
| Video Retrieval | MMEB Video Retrieval (MSRVTT, MSVD, DiDeMo, YouCook2, VATEX) v2 | Retrieval Score- | 0 | |
| Video Classification | MMEB Kinetics-700, SSv2, HMDB, UCF, Breakfast v2 | Classification Accuracy- | 0 |