| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image Captioning Evaluation | Flickr8K Expert (test) | Kendall tau_c56.4 | 76 | |
| Image Search | Flickr8K | R@13,100 | 74 | |
| Image Captioning Evaluation | Flickr8k Expert | Kendall Tau-c (tau_c)59.7 | 73 | |
| Image Captioning Evaluation | Flickr8K-CF (test) | Kendall tau_b38.8 | 65 | |
| Image Captioning Evaluation | Flickr8k-CF | Kendall-b Correlation (tau_b)40.5 | 62 | |
| Text Retrieval | Flickr8K (test) | R@584.2 | 31 | |
| Image Captioning | Flickr8K (test) | BLEU@438.3 | 27 | |
| Correlation with Human Judgment | Flickr8K-CF | Tau B37.8 | 26 | |
| Image-to-Text Retrieval | Flickr8k | R@158.5 | 22 | |
| Text-to-Image Retrieval | Flickr8K-CN | R@170.1 | 19 | |
| Image-to-Text Retrieval | Flickr8K CN | R@183.3 | 19 | |
| Image Annotation | Flickr8K | R@143.4 | 18 | |
| Correlation with human judgments | Flickr8K (Expert) | Kendall's Tau (τc)56.4 | 17 | |
| Correlation with human judgment | Flickr8K Expert 2013 (full) | Kendall's Tau53 | 14 | |
| Multimodal Alignment | FLICKR8K | Delta+ Mean Distance0.503 | 12 | |
| Image Search | Flickr8k (test) | R@141 | 11 | |
| Text-to-Image Retrieval | Flickr8k Rephrased | Recall@595.9 | 6 | |
| Image-to-Text Retrieval | Flickr8k-Rephrased | Recall@589.4 | 6 | |
| Image Captioning | Flickr8k | BLEU@438.4 | 6 | |
| Image Retrieval | Flickr8k zero-shot | R@144.4 | 6 | |
| Text Retrieval | Flickr8k zero-shot | R@158.5 | 6 | |
| Conditional Image Generation | Flickr8k | FID31.15 | 5 | |
| Image Annotation | Flickr8k (test) | R@113 | 4 | |
| Image Captioning | Flickr8K 1,000 images (test) | BLEU-10.63 | 3 | |
| Text Retrieval | Flickr8k | TRR68.6 | 2 |