| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Flickr30K | M2-Encoder | R@192.2 | 460 | 2d ago | |
| Flickr30k (test) | BLIP-2 | Recall@189.7 | 423 | 2d ago | |
| Flickr30K 1K (test) | ERNIE-ViL 2.0 | R@193.3 | 375 | 2d ago | |
| MSCOCO 5K (test) | MAP | R@160.9 | 286 | 2d ago | |
| MS-COCO 5K (test) | BLIP-2 ViT-g | R@168.3 | 223 | 2d ago | |
| COCO | Recall@168.3 | 130 | 2d ago | ||
| MSCOCO | BLIP | R@164.3 | 118 | 3d ago | |
| MSCOCO 1K (test) | AltCLIP | R@16,390 | 104 | 3d ago | |
| Flickr30K-CN | R2D2 | R@184.4 | 99 | 3d ago | |
| CUHK-PEDES (test) | CADA-L | Recall@178.37 | 96 | 3d ago | |
| MS-COCO | R@590.8 | 79 | 3d ago | ||
| DCI | Qwen3-VL-Embedding | R@179.7 | 68 | 3d ago | |
| MS-COCO (test) | K-LITE | R@12,208 | 66 | 3d ago | |
| RSITMD (test) | GeoRSCLIP-FT | R@125.04 | 61 | 3d ago | |
| COCO-CN | M2-Encoder | R@178.7 | 49 | 3d ago | |
| CC152K | L2RM-SGRAF | R@142.8 | 48 | 3d ago | |
| Flickr30k (1K) | ALIGN | R@184.9 | 48 | 3d ago | |
| MS-COCO 1K | CRCL | R@165.1 | 43 | 3d ago | |
| MSCOCO (5K) | AMoE | R@153.98 | 42 | 3d ago | |
| MUGE | CN-CLIP | R@168.9 | 40 | 3d ago | |
| MS-COCO 5K | SoftMask++ | R@154.1 | 39 | 3d ago | |
| ShareGPT4V | LamRA | R@197.9 | 35 | 3d ago | |
| Flickr | U-MARVEL+ | R@198.9 | 35 | 3d ago | |
| RSICD (test) | GeoRSCLIP-FT | R@115.59 | 34 | 3d ago | |
| Urban-1K | LamRA | R@198.8 | 34 | 3d ago |