| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image-to-Text Retrieval | Flickr30K 1K (test) | R@199.1 | 439 | |
| Text-to-Image Retrieval | Flickr30k (test) | Recall@189.7 | 423 | |
| Image-to-Text Retrieval | Flickr30k | R@199 | 379 | |
| Text-to-Image Retrieval | Flickr30K 1K (test) | R@193.3 | 375 | |
| Image-to-Text Retrieval | Flickr30k (test) | R@196.6 | 370 | |
| Image Retrieval | Flickr30k (test) | R@185.6 | 195 | |
| Image Captioning | Flickr30K | CIDEr Score87.12 | 111 | |
| Image Captioning | Flickr30k (test) | CIDEr98.3 | 103 | |
| Text-to-Image Retrieval | Flickr30K-CN | R@184.4 | 99 | |
| Image-to-Text Retrieval | Flickr30K-CN | R@196.6 | 99 | |
| Text Retrieval | Flickr30K (test) | R@195.9 | 89 | |
| Text Retrieval | Flickr30K 1K (test) | R@198.5 | 82 | |
| Image Captioning | Flickr30K (Karpathy split) | CIDEr123.1 | 76 | |
| Text Retrieval | Flickr30K | R@197.3 | 75 | |
| Image Retrieval | Flickr30K 1K (test) | R@191.1 | 70 | |
| Image-to-text retrieval | Flickr30K 1K Karpathy (test) | R@197.2 | 59 | |
| Image Captioning | Flickr30k | CIDEr84.5 | 55 | |
| Text-to-image Retrieval | Flickr30k (1K) | R@184.9 | 48 | |
| Image Annotation | Flickr30k (test) | R@155.5 | 39 | |
| Caption Retrieval | Flickr30k (test) | R@176.4 | 36 | |
| Phrase Localization | Flickr30K Entities (test) | Accuracy83.15 | 35 | |
| Sentence Retrieval | Flickr30K | R@13,740 | 32 | |
| Text Retrieval | Flickr30K (1K) | R@195.4 | 30 | |
| Text Retrieval | Flickr30k Zero-shot (test) | Recall@194.1 | 30 | |
| Image-to-text Retrieval | Flickr30k (1K) | R@193.4 | 30 |