Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Fast Zero-Shot Image Tagging

About

The well-known word analogy experiments show that the recent word vectors capture fine-grained linguistic regularities in words by linear vector offsets, but it is unclear how well the simple vector offsets can encode visual regularities over words. We study a particular image-word relevance relation in this paper. Our results show that the word vectors of relevant tags for a given image rank ahead of the irrelevant tags, along a principal direction in the word vector space. Inspired by this observation, we propose to solve image tagging by estimating the principal direction for an image. Particularly, we exploit linear mappings and nonlinear deep neural networks to approximate the principal direction from an input image. We arrive at a quite versatile tagging model. It runs fast given a test image, in constant time w.r.t.\ the training set size. It not only gives superior performance for the conventional tagging task on the NUS-WIDE dataset, but also outperforms competitive baselines on annotating images with previously unseen tags

Yang Zhang, Boqing Gong, Mubarak Shah• 2016

Related benchmarks

TaskDatasetResultRank
Multi-Label ClassificationNUS-WIDE 925/81 (unseen)
mAP (Mean Average Precision)15.1
43
Multi-Label ClassificationNUS-WIDE
mAP22.4
38
Multi-Label ClassificationOpen Images
mAP68.6
24
Multi-Label ClassificationMS COCO 48 seen / 17 unseen classes v1
Precision38.5
18
Multi-Label ClassificationNUS-WIDE 81 unseen labels (test)
mAP0.224
17
Image TaggingNUS-WIDE 81 unseen tags
mAP42.2
16
Multi-Label ClassificationOpen Images (test)
mAP69
16
Image TaggingNUS-WIDE 925 seen + 81 unseen tags (test)
MiAP19.1
14
Multi-Label ClassificationNUS-WIDE
Precision @ K=322.6
14
ClassificationMS-COCO (val)
F1 (K=3)37.5
10
Showing 10 of 21 rows

Other info

Follow for update