| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Image Captioning | TextCaps | CIDEr164.3 | 96 | |
| Image Captioning | TextCaps (val) | CIDEr163.7 | 51 | |
| Image Captioning | TextCaps (test) | CIDEr164.3 | 50 | |
| Text-oriented Visual Question Answering | TextCaps | CIDEr144.9 | 7 | |
| Image Reconstruction | TextCaps (test) | FID15.51 | 6 |