Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MMEB

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal RetrievalMMEB
Classification Score788.1
50
Image EmbeddingMMEB v1 (test)
Classification67.6
23
Multimodal RankingMMEB
Classification Score70
22
Multi-modal EmbeddingMMEB 1.0 (test)
Classification Accuracy65.6
18
Multi-modal Representation LearningMMEB OOD 1.0
OOD Precision@159.1
18
Multi-modal Representation LearningMMEB In-Distribution 1.0
MMEB IND Precision@171.6
18
Multi-modal Representation LearningMMEB Overall 1.0
Classification P@161.6
18
Multimodal Embedding EvaluationMMEB Overall
Classification Score72.6
18
RetrievalMMEB v2
Image Retrieval Score78.2
18
Multimodal Embedding EvaluationMMEB V2 (test)
Image CLS Hit@167.1
14
Multimodal Retrieval and UnderstandingMMEB V2 (test)
Image CLS Acc76.7
14
Zero-shot Image ClassificationMMEB (val)
Image Classification Accuracy66.8
9
Multimodal Visual Document RetrievalMMEB Visual Document portion v2
ViDoRe ArXivQA Score88.7
9
Multimodal Video RetrievalMMEB Video portion v2
K700 Score56.8
9
Video RetrievalMMEB Video Retrieval (MSRVTT, MSVD, DiDeMo, YouCook2, VATEX) v2 (test)
Retrieval Score43.1
8
Video ClassificationMMEB Video Classification (Kinetics-700, SSv2, HMDB, UCF, Breakfast) v2 (test)
Classification Accuracy63.7
8
Video Question AnsweringMMEB Video QA v2 (test)
Average Score72.5
6
Video UnderstandingMMEB Video v2
Overall Score59.9
5
Video Action RecognitionMMEB Video zero-shot
Overall Accuracy63
2
Video RetrievalMMEB Video Retrieval (MSRVTT, MSVD, DiDeMo, YouCook2, VATEX) v2
Retrieval Score-
0
Video ClassificationMMEB Kinetics-700, SSv2, HMDB, UCF, Breakfast v2
Classification Accuracy-
0
Showing 21 of 21 rows