Learning a Weight Map for Weakly-Supervised Localization

About

In the weakly supervised localization setting, supervision is given as an image-level label. We propose to employ an image classifier $f$ and to train a generative network $g$ that outputs, given the input image, a per-pixel weight map that indicates the location of the object within the image. Network $g$ is trained by minimizing the discrepancy between the output of the classifier $f$ on the original image and its output given the same image weighted by the output of $g$. The scheme requires a regularization term that ensures that $g$ does not provide a uniform weight, and an early stopping criterion in order to prevent $g$ from over-segmenting the image. Our results indicate that the method outperforms existing localization methods by a sizable margin on the challenging fine-grained classification datasets, as well as a generic image recognition dataset. Additionally, the obtained weight map is also state-of-the-art in weakly supervised segmentation in fine-grained categorization datasets.

Tal Shaharabany, Lior Wolf• 2021

Related benchmarks

Task	Dataset	Result	Rank
Weakly Supervised Object Localization	Stanford Cars (test)	Localization Accuracy96.2		7
Weakly Supervised Object Localization	Stanford Dogs (test)	Localization Accuracy79.2		7

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord