Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

About

Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn object localization models. Optimizing a convolutional neural network (CNN) for classification tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue. In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN, where the convolution operations produce local receptive fields and experience difficulty to capture long-range feature dependency among pixels. We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction. TS-CAM first splits an image into a sequence of patch tokens for spatial embedding, which produce attention maps of long-range visual dependency to avoid partial activation. TS-CAM then re-allocates category-related semantics for patch tokens, enabling each of them to be aware of object categories. TS-CAM finally couples the patch tokens with the semantic-agnostic attention map to achieve semantic-aware localization. Experiments on the ILSVRC/CUB-200-2011 datasets show that TS-CAM outperforms its CNN-CAM counterparts by 7.1%/27.1% for WSOL, achieving state-of-the-art performance.

Wei Gao, Fang Wan, Xingjia Pan, Zhiliang Peng, Qi Tian, Zhenjun Han, Bolei Zhou, Qixiang Ye• 2021

Related benchmarks

TaskDatasetResultRank
Weakly Supervised Object LocalizationCUB (test)
Top-1 Loc Acc71.3
80
Object LocalizationImageNet-1k (val)
Top-1 Loc Acc53.4
80
Object LocalizationCUB-200-2011 (test)
Top-1 Loc. Accuracy71.3
68
Image ClassificationCUB (test)
Top-1 Accuracy80.3
31
Weakly Supervised Object LocalizationCUB-200 (test)
Top-1 Loc Acc71.3
26
Object LocalizationCUB v2
Max Box Acc V276.7
20
Image ClassificationGBC dataset (test)
Accuracy86.2
15
Weakly Supervised Object LocalizationILSVRC (test)
Top-1 Loc Acc53.4
14
Weakly Supervised Object LocalizationCUB-200-2011 v2
MaxBoxAccV279.6
10
Weakly Supervised Object LocalizationImageNet-1k (val)
Top-1 Loc Acc53.4
10
Showing 10 of 19 rows

Other info

Follow for update