| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AEA 553 events (test) | SigLIP 2 | Top-5 Recall54.6 | 48 | 2mo ago | |
| RSICD (test) | Full-FT GeoRSCLIP | Image-to-Text R@121.13 | 32 | 3mo ago | |
| Clotho (test) | CART | R@146.4 | 29 | 3mo ago | |
| RSITMD (test) | HarMA | R@1 (Image-to-Text)32.74 | 28 | 3mo ago | |
| Flickr30k (test) | RCAR | Image-to-text Recall@182.3 | 25 | 2mo ago | |
| AudioCaps (test) | OmniBind | R@159.1 | 23 | 3mo ago | |
| CheXpert Plus | Recall@113.83 | 20 | 1d ago | ||
| MIMIC-CXR | Recall@122.137 | 20 | 1d ago | ||
| MSR-VTT (test) | HoPA | R@1 (V→T)37.3 | 19 | 1mo ago | |
| MSCOCO 1K | MURAL-LARGE | Mean Recall (ja)91.6 | 16 | 3mo ago | |
| InstVL video (Global) | InstAP | T2V R@194.5 | 12 | 1mo ago | |
| InstVL video (Instance) | InstAP | T2V Recall@160.63 | 12 | 1mo ago | |
| InstVL img-zero 10K (Global) | InstAP | T2V R@183.33 | 12 | 1mo ago | |
| InstVL img-zero 10K | InstAP | T2V R@128.25 | 12 | 1mo ago | |
| InstVL img-zero 1K (Global) | InstAP | T2V Recall@188.7 | 12 | 1mo ago | |
| InstVL img-zero 1K (Instance) | InstAP | T2V R@141.94 | 12 | 1mo ago | |
| InstVL img 10K (Global) | InstAP | T2V Recall@195.77 | 12 | 1mo ago | |
| InstVL img 10K | InstAP | T2V Recall@144.05 | 12 | 1mo ago | |
| InstVL img 1K (Global) | InstAP | T2V R@199.2 | 12 | 1mo ago | |
| InstVL img 1K Instance | InstAP | T2V R@150.25 | 12 | 1mo ago | |
| MSCOCO 5K (test) | DSMD | R@1 (I2T)0.621 | 12 | 3mo ago | |
| UCM (test) | R@1 (I2T)28.57 | 12 | 3mo ago | ||
| MSCOCO (5K) | ALIGN-L2 | Mean Recall (ja)83.4 | 12 | 3mo ago | |
| Unseen Characters (Strict OOD Split) | ATRIE | mAP75 | 10 | 1mo ago | |
| MS-COCO 1K image folds (test) | FedAFD | RSum@159.8 | 8 | 2mo ago |