| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RSICD | GeoMELT | Mean Recall40.72 | 119 | 18d ago | |
| Flickr30K (test) | Uni-Perceiver-L + Conditional MoEs | R@1 (Img->Txt)94.1 | 45 | 22d ago | |
| COCO (val) | DINOv2-ARL | R@160.1 | 43 | 1mo ago | |
| RSICD (test) | mR10.43 | 43 | 1mo ago | ||
| COCO (test) | BLIP_FUSECAP | Recall@197.2 | 41 | 1mo ago | |
| General Domain | CLIPScore | Retrieval Score31.27 | 30 | 1mo ago | |
| MS-COCO 5K (test) | CDDS | RSum (Composite Score)472.1 | 28 | 19d ago | |
| MSCOCO (test) | CCLM | EN Retrieval Score95.6 | 28 | 22d ago | |
| MSCOCO | MDPD | IR@181.9 | 27 | 3d ago | |
| Flickr30K | MDCS-SGA | R@1100 | 25 | 1mo ago | |
| COCO 1.0 (test) | CoM-PT | R@148.79 | 24 | 4d ago | |
| MS-COCO 1K (test) | CDDS | Image-to-Text R@184.9 | 24 | 1mo ago | |
| MSCOCO 5K | HarmoCLIP | I-T Score69.78 | 24 | 6d ago | |
| DCI long-text | DeBias-CLIP | Top-1 Accuracy63.4 | 22 | 19d ago | |
| IIW (test) | DreamLIP-3m | Recall@177.9 | 21 | 22d ago | |
| General RET | Recall33.35 | 21 | 1mo ago | ||
| COCO | ACED-F2 | Retrieval Score58.3 | 21 | 1mo ago | |
| Flickr30K | Ours♠ | I->T Retrieval Score86.4 | 18 | 6d ago | |
| MSCOCO | CLIP | MR66.7 | 18 | 1mo ago | |
| Flickr30K | CLIP | MR90.1 | 18 | 1mo ago | |
| Flickr30k (val) | GoldiCLIP | Text-to-Image Recall@183 | 16 | 22d ago | |
| MSCOCO (val) | GoldiCLIP | T2I Recall@155.5 | 16 | 22d ago | |
| CBVS-20K (test) | UniCLIP | R@150.3 | 16 | 1mo ago | |
| MIMIC 5x200 | LGDEA | Precision@156.31 | 15 | 1mo ago | |
| IIW-FG sentence-level | GoldiCLIP | T2I Recall@144 | 12 | 22d ago |