| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image-to-text retrieval | DOCCI (test) | Recall@195.9 | 43 | |
| Image-to-Text retrieval | DOCCI | R@185.2 | 38 | |
| Text-to-Image retrieval | DOCCI | Recall@186.73 | 38 | |
| Image Captioning | DOCCI 500 (test) | CAPTURE64.31 | 32 | |
| Text-to-Image Retrieval | DOCCI (val) | Recall@117.79 | 27 | |
| Image-to-Text Retrieval | DOCCI (val) | R@152.78 | 27 | |
| Crossmodal retrieval | DOCCI | Recall@183.04 | 12 | |
| Zero-shot Image-Text Retrieval | DOCCI | Accuracy66.2 | 7 | |
| Image-to-text retrieval | DOCCI (full) | Recall@191.3 | 6 |