| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Multi-modal dialogue retrieval | PhotoChat (test) | R@128.3 | 29 | |
| Intent Prediction | PhotoChat (test) | F1 Score65.6 | 26 | |
| Text-to-Image Retrieval | PhotoChat (test) | R@144.17 | 19 | |
| Text Response Generation | PhotoChat (test) | Perplexity (PPL)59.21 | 6 | |
| Image Generation | PhotoChat (test) | FID29.04 | 6 | |
| Image Description Generation | PhotoChat (test) | Perplexity (PPL)5.12 | 6 | |
| Dialogue Response Generation | PhotoChat (test) | Kappa0.68 | 5 | |
| Image Generation | PhotoChat | FID9.72 | 4 | |
| Response Generation | PhotoChat | BLEU-123.6 | 4 |