| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image Captioning | ROCO v2 | METEOR22.45 | 20 | |
| Image-to-text retrieval | ROCO (test) | R@131.41 | 9 | |
| Text-to-image retrieval | ROCO (test) | R@128.02 | 9 | |
| Average Cross-modal Retrieval | ROCO non-rad | mAP69.3 | 6 | |
| image-to-text retrieval | ROCO | R@139.5 | 5 | |
| Image-caption retrieval | ROCO | R@10 (Std)66.1 | 4 | |
| Latent Space Alignment | ROCO | Cos True Pairs0.54 | 3 | |
| Image Captioning | ROCO (test) | BLEU@116.96 | 3 | |
| Medical image captioning | ROCO (test) | ROUGE-L19.2 | 3 | |
| text-to-image retrieval | ROCO | R@117.95 | 2 |