| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image Captioning | nocaps (val) | CIDEr (Overall)126.9 | 115 | |
| Image Captioning | NoCaps | CIDEr127 | 111 | |
| Image Captioning | NoCaps (test) | CIDEr (overall)126.4 | 61 | |
| Image-Text Alignment Evaluation | nocaps out-of-domain (val) | CLIPScore84.1 | 40 | |
| Image Captioning | NoCaps | CIDEr (in-domain)111.3 | 36 | |
| Image Captioning | NoCaps 1.0 (val) | Overall Score127 | 32 | |
| Caption Matching and Retrieval | NoCaps (val) | Matching Accuracy99.5 | 26 | |
| Image Captioning | nocaps standard (test) | CIDEr124.8 | 26 | |
| Scene captioning | nocaps RGBP seen scenes (val) | CIDEr102.67 | 22 | |
| Caption Evaluation | nocaps | Win Rate71.1 | 20 | |
| Object Hallucination Detection | nocaps FOIL (Out-Domain) | AP89.1 | 17 | |
| Object Hallucination Detection | nocaps-FOIL (Near-Domain) | AP92.6 | 17 | |
| Object Hallucination Detection | nocaps FOIL In-Domain | AP88.8 | 17 | |
| Object Hallucination Detection | nocaps-FOIL (Overall) | AP91.1 | 17 | |
| Text-to-image retrieval | NoCaps | Recall@176.2 | 17 | |
| Image-to-text retrieval | NoCaps | R@190.9 | 17 | |
| Out-of-domain Image Captioning | NoCaps | CIDEr1.055 | 16 | |
| Novel Object Captioning | NoCaps (val) | CIDEr (In-Domain)85.4 | 16 | |
| Image Captioning | Nocaps | CIDEr83.7 | 15 | |
| Image Captioning | Nocaps | Primary Score109.37 | 14 | |
| Text-to-text retrieval | nocaps | mAP43.7 | 12 | |
| Image Captioning | NoCaps 4,500 (test) | CIDEr122.1 | 12 | |
| Image Captioning | Nocaps | Clean CIDEr105.7 | 10 | |
| Image Captioning | NoCaps | BLEU-447.7 | 9 | |
| Image Captioning | nocaps XD (val) | CIDEr106.8 | 8 |