Visual Concept Recognition and Localization via Iterative Introspection

About

Convolutional neural networks have been shown to develop internal representations, which correspond closely to semantically meaningful objects and parts, although trained solely on class labels. Class Activation Mapping (CAM) is a recent method that makes it possible to easily highlight the image regions contributing to a network's classification decision. We build upon these two developments to enable a network to re-examine informative image regions, which we term introspection. We propose a weakly-supervised iterative scheme, which shifts its center of attention to increasingly discriminative regions as it progresses, by alternating stages of classification and introspection. We evaluate our method and show its effectiveness over a range of several datasets, where we obtain competitive or state-of-the-art results: on Stanford-40 Actions, we set a new state-of the art of 81.74%. On FGVC-Aircraft and the Stanford Dogs dataset, we show consistent improvements over baselines, some of which include significantly more supervision.

Amir Rosenfeld, Shimon Ullman• 2016

Related benchmarks

Task	Dataset	Result	Rank
Action Recognition	Stanford 40 (test)	Accuracy81.7		13

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord