| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RSICD (test) | Full-FT GeoRSCLIP | Image-to-Text R@121.13 | 32 | 4d ago | |
| Clotho (test) | CART | R@146.4 | 29 | 4d ago | |
| RSITMD (test) | HarMA | R@1 (Image-to-Text)32.74 | 28 | 4d ago | |
| AudioCaps (test) | OmniBind | R@159.1 | 23 | 4d ago | |
| Flickr30k (test) | RCAR | Image-to-text Recall@182.3 | 17 | 4d ago | |
| MSCOCO 1K | MURAL-LARGE | Mean Recall (ja)91.6 | 16 | 4d ago | |
| MSCOCO 5K (test) | DSMD | R@1 (I2T)0.621 | 12 | 4d ago | |
| UCM (test) | R@1 (I2T)28.57 | 12 | 4d ago | ||
| MSCOCO (5K) | ALIGN-L2 | Mean Recall (ja)83.4 | 12 | 4d ago | |
| MSR-VTT 3 modal | CLIP | Gap29 | 7 | 4d ago | |
| Sketch–Face (CUFS) | GCN | R@188.4 | 6 | 4d ago | |
| FashionGen full 31 (test) | FashionSAP | Recall@158.63 | 6 | 3d ago | |
| MS-COCO 5K | CRCL | rSum433.3 | 5 | 3d ago | |
| Multimodal Dataset | Vision-Enhanced LLM | Recall@1072.1 | 4 | 3d ago | |
| Skull–Face IIT_Mandi_S2F | GCN | R@150 | 4 | 4d ago | |
| AV-MNIST 3 modal | ModGap | Gap0.09 | 3 | 4d ago | |
| MSCOCO 2 modal | ModGap | Gap0.03 | 3 | 4d ago | |
| CIFAR10 2 modal | CLIP | Gap0.86 | 3 | 4d ago |