| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MMEB | UniME-V2 | Classification Score788.1 | 50 | 1mo ago | |
| Multi30k (test) | CCLM | Recall (EN)97.8 | 35 | 1mo ago | |
| M5Product | MOON2.0 | Recall@115.27 | 30 | 24d ago | |
| MS COCO Karpathy (test) | Multimodal Fusion | Recall@147.3 | 27 | 3d ago | |
| Flickr8k Karpathy (test) | Multimodal Fusion | Recall@171.8 | 27 | 3d ago | |
| MMEB Image V2 | UME-R1 | CLS Accuracy69.1 | 22 | 15d ago | |
| VGGSound-S (test) | Recall@1 (Video -> Text)6.8 | 19 | 4d ago | ||
| MMEB v1 (test) | VLM2VEC-7B | Classification61.2 | 18 | 2d ago | |
| MMEB Total v2 | UME-R1 | Overall Score68.1 | 15 | 1mo ago | |
| MMEB Video V2 | UME-R1 | CLS Accuracy51.6 | 15 | 1mo ago | |
| MT-FIQ | V-Retrver-7B | Recall@568.3 | 15 | 1mo ago | |
| WikiVideo (test) | RankVideo | Alpha-nDCG62.8 | 10 | 1mo ago | |
| MULTIMODALQA Doc (test) | VISRAG | Total Time (ms)371 | 10 | 1mo ago | |
| WholeHouse-MM (test) | Affordance RAG | Target Object Recall@532.8 | 9 | 1mo ago | |
| M-BEIR extended (test) | MMEmbed | Recall (I→I)32.1 | 7 | 1mo ago | |
| RAVENEA (test) | Ravenea-CLIP | MRR82.17 | 7 | 1mo ago | |
| VisD | V-Retrver-7B | R@175.1 | 7 | 1mo ago | |
| GeneCIS | V-Retrver-7B | Recall@130.7 | 7 | 1mo ago | |
| EDIS-2 | GME-2B | Recall@570.32 | 6 | 1mo ago | |
| OVEN-8 | GME-2B | R@575.98 | 6 | 1mo ago | |
| OVEN-6 | GME-2B | R@558.17 | 6 | 1mo ago | |
| InfoSeek-8 | Reasoning-Augmented Representations | Recall@552.38 | 6 | 1mo ago | |
| InfoSeek 6 | Reasoning-Augmented Representations | R@542.08 | 6 | 1mo ago | |
| WebQA 2 | GME-2B | R@583.15 | 6 | 1mo ago | |
| WebQA 1 | GME-2B | Recall@595.19 | 6 | 1mo ago |