Order-Embeddings of Images and Language

About

Hypernymy, textual entailment, and image captioning can be seen as special cases of a single visual-semantic hierarchy over words, sentences, and images. In this paper we advocate for explicitly modeling the partial order structure of this hierarchy. Towards this goal, we introduce a general method for learning ordered representations, and show how it can be applied to a variety of tasks involving images and language. We show that the resulting representations improve performance over current approaches for hypernym prediction and image-caption retrieval.

Ivan Vendrov, Ryan Kiros, Sanja Fidler, Raquel Urtasun• 2015

Related benchmarks

Task	Dataset	Result
Natural Language Inference	SNLI (test)	Accuracy88.6	694
Image-to-Text Retrieval	MS-COCO 5K (test)	R@123.3	324
Text-to-Image Retrieval	MSCOCO 5K (test)	R@131.7	312
Natural Language Inference	SNLI (train)	Accuracy98.8	154
Image Retrieval	MS-COCO 1K (test)	R@137.9	128
Text-to-Image Retrieval	MSCOCO (1K test)	R@137.9	118
Image-to-Text Retrieval	MSCOCO (1K test)	R@146.7	96
Caption Retrieval	MS COCO Karpathy 1k (test)	R@146.7	62
Link Prediction	WordNet noun hierarchy (transitive closure) (test)	F184.1	40
Caption Retrieval	MS COCO Karpathy 5k (test)	R@131.7	26

Showing 10 of 34 rows

Other info

Follow for update

@wizwand_team Discord