Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Top-down Neural Attention by Excitation Backprop

About

We aim to model the top-down attention of a Convolutional Neural Network (CNN) classifier for generating task-specific attention maps. Inspired by a top-down human visual attention model, we propose a new backpropagation scheme, called Excitation Backprop, to pass along top-down signals downwards in the network hierarchy via a probabilistic Winner-Take-All process. Furthermore, we introduce the concept of contrastive attention to make the top-down attention maps more discriminative. In experiments, we demonstrate the accuracy and generalizability of our method in weakly supervised localization tasks on the MS COCO, PASCAL VOC07 and ImageNet datasets. The usefulness of our method is further validated in the text-to-region association task. On the Flickr30k Entities dataset, we achieve promising performance in phrase localization by leveraging the top-down attention of a CNN model that has been trained on weakly labeled web images.

Jianming Zhang, Zhe Lin, Jonathan Brandt, Xiaohui Shen, Stan Sclaroff• 2016

Related benchmarks

TaskDatasetResultRank
Pointing localizationVOC 2007 (test)
Mean Accuracy (All)90.7
44
Pointing gameMSCOCO 2014 (val)
Mean Accuracy (All)58.5
42
Phrase LocalizationFlickr30K Entities (test)--
35
Phrase LocalizationVisualGenome (VG) (test)
Pointing Accuracy19.31
29
Pointing localizationVOC Difficult 2007 (test)
Accuracy72.3
21
Phrase groundingFlickr30K--
20
Phrase groundingReferIt (test)
Pointing Accuracy31.97
18
Visual GroundingReferIt
Pointing Game Accuracy31.97
16
Weakly Supervised GroundingVisual Genome (VG) (test)
Accuracy (Pointing Game)19.31
15
Weakly Supervised GroundingFlickr30k (test)
Accuracy (Pointing Game)42.4
14
Showing 10 of 18 rows

Other info

Follow for update