Fast Zero-Shot Image Tagging

About

The well-known word analogy experiments show that the recent word vectors capture fine-grained linguistic regularities in words by linear vector offsets, but it is unclear how well the simple vector offsets can encode visual regularities over words. We study a particular image-word relevance relation in this paper. Our results show that the word vectors of relevant tags for a given image rank ahead of the irrelevant tags, along a principal direction in the word vector space. Inspired by this observation, we propose to solve image tagging by estimating the principal direction for an image. Particularly, we exploit linear mappings and nonlinear deep neural networks to approximate the principal direction from an input image. We arrive at a quite versatile tagging model. It runs fast given a test image, in constant time w.r.t.\ the training set size. It not only gives superior performance for the conventional tagging task on the NUS-WIDE dataset, but also outperforms competitive baselines on annotating images with previously unseen tags

Yang Zhang, Boqing Gong, Mubarak Shah• 2016

Related benchmarks

Task	Dataset	Result
Multi-Label Classification	NUS-WIDE 925/81 (unseen)	mAP (Mean Average Precision)15.1	43
Multi-Label Classification	NUS-WIDE	mAP22.4	38
Multi-Label Classification	Open Images	mAP68.6	24
Multi-Label Classification	MS COCO 48 seen / 17 unseen classes v1	Precision38.5	18
Multi-Label Classification	NUS-WIDE 81 unseen labels (test)	mAP0.224	17
Image Tagging	NUS-WIDE 81 unseen tags	mAP42.2	16
Multi-Label Classification	Open Images (test)	mAP69	16
Image Tagging	NUS-WIDE 925 seen + 81 unseen tags (test)	MiAP19.1	14
Multi-Label Classification	NUS-WIDE	Precision @ K=322.6	14
Classification	MS-COCO (val)	F1 (K=3)37.5	10

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord