| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Text-to-Image Retrieval | MSCOCO 5K (test) | R@160.9 | 312 | |
| Image-to-text retrieval | MSCOCO | R@181.9 | 152 | |
| Text Retrieval | MSCOCO | Recall@1100 | 142 | |
| Text-to-image retrieval | MSCOCO | R@164.3 | 142 | |
| Text-to-Image Retrieval | MSCOCO 1K (test) | R@16,390 | 118 | |
| Visual Hallucination Evaluation | MSCOCO | CHAIR_i18.2 | 104 | |
| Object Hallucination Evaluation | MSCOCO 2014 (val) | CHAIRs56.8 | 81 | |
| Object Hallucination Evaluation | MSCOCO POPE | Random Accuracy91.63 | 71 | |
| Image-to-text retrieval | MSCOCO 5K (test) | R@184.8 | 68 | |
| Sentence Retrieval | MSCOCO 5k (test) | R@180.9 | 67 | |
| Object Detection | MSCOCO | ASR94.5 | 54 | |
| Text-to-Image Generation | MSCOCO 30K | FID6.61 | 54 | |
| Text-to-Image Retrieval | MSCOCO (val) | R@138.97 | 51 | |
| Image-to-Text Retrieval | MSCOCO (val) | R@158.14 | 51 | |
| Text-to-image Retrieval | MSCOCO (5K) | R@153.98 | 51 | |
| Object Hallucination | MSCOCO 500 images 2014 (val) | Consistency Score (CS)60.6 | 50 | |
| Text-to-Image Retrieval | MSCOCO | mAP@5094 | 47 | |
| Object Hallucination Detection | MSCOCO | AUROC89.62 | 46 | |
| Text-to-Image Generation | MSCOCO 2014 | FID (30k)9.29 | 44 | |
| Object Hallucination Evaluation | MSCOCO | Accuracy93.87 | 43 | |
| Object Detection | MSCOCO (val) | AP61.3 | 43 | |
| Image-to-text Retrieval | MSCOCO (5K) | R@177.8 | 42 | |
| Pointing game | MSCOCO 2014 (val) | Mean Accuracy (All)69.9 | 42 | |
| Image Retrieval | MSCOCO @5000 (test) | mAP87.27 | 39 | |
| Object Hallucination Assessment | MSCOCO | CHAIR Instance Score30.2 | 38 |