| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image Captioning | NoCaps | CIDEr127 | 101 | |
| Image Captioning | nocaps (val) | CIDEr (Overall)126.9 | 93 | |
| Image Captioning | NoCaps (test) | CIDEr (overall)126.4 | 61 | |
| Image Captioning | NoCaps | CIDEr (in-domain)111.3 | 36 | |
| Image Captioning | NoCaps 1.0 (val) | Overall Score127 | 29 | |
| Caption Matching and Retrieval | NoCaps (val) | Matching Accuracy99.5 | 26 | |
| Image Captioning | nocaps standard (test) | CIDEr124.8 | 26 | |
| Text-to-image retrieval | NoCaps | Recall@176.2 | 17 | |
| Image-to-text retrieval | NoCaps | R@190.9 | 17 | |
| Out-of-domain Image Captioning | NoCaps | CIDEr1.055 | 16 | |
| Novel Object Captioning | NoCaps (val) | CIDEr (In-Domain)85.4 | 16 | |
| Image Captioning | Nocaps | CIDEr83.7 | 15 | |
| Image Captioning | Nocaps | Primary Score109.37 | 14 | |
| Image Captioning | NoCaps 4,500 (test) | CIDEr122.1 | 12 | |
| Image Captioning | nocaps XD (val) | CIDEr106.8 | 8 | |
| Image Captioning | NoCaps 1.0 (test) | CIDEr97.1 | 7 | |
| Super-Resolution | Nocaps 16x (test) | LR PSNR (dB)78.42 | 6 | |
| Super-Resolution | Nocaps 8x (test) | PSNR (dB)72.94 | 6 | |
| Image Reconstruction | NOCAPS (val) | LPIPS0.205 | 5 | |
| Image Retrieval | nocaps out (val) | P@188.7 | 5 | |
| Image Retrieval | nocaps near (val) | P@189.1 | 5 | |
| Image Captioning | nocaps XD (test) | CIDEr102.4 | 5 | |
| Image Captioning | NoCaps | CIDEr-D (In)42.1 | 4 | |
| Image Captioning | NoCaps zero-shot | CIDEr119 | 4 | |
| Image Captioning | NoCaps Zero-shot (val) | CIDEr (in-domain)114.67 | 2 |