| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Image Retrieval | MSCOCO 5K (test) | R@160.9 | 286 | |
| Image-to-text retrieval | MSCOCO | R@181.9 | 124 | |
| Text-to-image retrieval | MSCOCO | R@164.3 | 118 | |
| Visual Hallucination Evaluation | MSCOCO | CHAIR_i18.2 | 104 | |
| Text-to-Image Retrieval | MSCOCO 1K (test) | R@16,390 | 104 | |
| Sentence Retrieval | MSCOCO 5k (test) | R@180.9 | 67 | |
| Object Hallucination Evaluation | MSCOCO 2014 (val) | CHAIRs54.6 | 55 | |
| Object Hallucination | MSCOCO 500 images 2014 (val) | Consistency Score (CS)60.6 | 50 | |
| Image-to-text retrieval | MSCOCO 5K (test) | R@184.8 | 46 | |
| Object Detection | MSCOCO (val) | AP61.3 | 43 | |
| Text-to-image Retrieval | MSCOCO (5K) | R@153.98 | 42 | |
| Pointing game | MSCOCO 2014 (val) | Mean Accuracy (All)69.9 | 42 | |
| Text-to-Image Generation | MSCOCO 30K | FID6.61 | 42 | |
| Image Retrieval | MSCOCO @5000 (test) | mAP87.27 | 39 | |
| Object Hallucination Assessment | MSCOCO | CHAIR Instance Score30.2 | 38 | |
| Hallucination Evaluation | MSCOCO (val) | CHAIR_i23.04 | 36 | |
| Object Detection | MSCOCO 2017 (val) | APb47.5 | 33 | |
| Image-to-Text Retrieval | MSCOCO (test) | R@594.5 | 33 | |
| Image-to-text Retrieval | MSCOCO (5K) | R@177.8 | 33 | |
| Text-to-Image synthesis | MSCOCO | FID6.95 | 31 | |
| Text Retrieval | MSCOCO (test) | TR@177.6 | 31 | |
| Image Retrieval | MSCOCO (test) | IR@160.7 | 31 | |
| Image Captioning | MSCOCO (test) | CIDEr149.6 | 29 | |
| Text Retrieval | MSCOCO (5K) | R@183.2 | 29 | |
| Text-to-image retrieval | MSCOCO 48 (test) | R@152.5 | 28 |