Order-Embeddings of Images and Language
About
Hypernymy, textual entailment, and image captioning can be seen as special cases of a single visual-semantic hierarchy over words, sentences, and images. In this paper we advocate for explicitly modeling the partial order structure of this hierarchy. Towards this goal, we introduce a general method for learning ordered representations, and show how it can be applied to a variety of tasks involving images and language. We show that the resulting representations improve performance over current approaches for hypernym prediction and image-caption retrieval.
Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun• 2015
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Natural Language Inference | SNLI (test) | Accuracy88.6 | 681 | |
| Image-to-Text Retrieval | MS-COCO 5K (test) | R@123.3 | 299 | |
| Text-to-Image Retrieval | MSCOCO 5K (test) | R@131.7 | 286 | |
| Natural Language Inference | SNLI (train) | Accuracy98.8 | 154 | |
| Image Retrieval | MS-COCO 1K (test) | R@137.9 | 128 | |
| Text-to-Image Retrieval | MSCOCO (1K test) | R@137.9 | 104 | |
| Image-to-Text Retrieval | MSCOCO (1K test) | R@146.7 | 82 | |
| Caption Retrieval | MS COCO Karpathy 1k (test) | R@146.7 | 62 | |
| Link Prediction | WordNet noun hierarchy (transitive closure) (test) | F184.1 | 40 | |
| Caption Retrieval | MS COCO Karpathy 5k (test) | R@131.7 | 26 |
Showing 10 of 30 rows