Differentiable Patch Selection for Image Recognition

About

Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained end-to-end using backpropagation. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.

Jean-Baptiste Cordonnier, Aravindh Mahendran, Alexey Dosovitskiy, Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner• 2021

Related benchmarks

Task	Dataset	Result
Time-series classification	SelfRegulationSCP2	Accuracy55.1	148
Time-series classification	Heartbeat	Accuracy70.5	131
Time-series classification	SelfRegulationSCP1	Accuracy87.2	123
Time-series classification	FaceDetection	Accuracy65.4	58
Multivariate Time Series Classification	Finger Movement	Accuracy58	49
Multivariate Time Series Classification	MotorImagery	Accuracy53	41
Fine-grained visual classification	CUB-200	Accuracy86.7	24
Traffic Sign Recognition	Swedish traffic signs dataset Subset setup (test)	Accuracy91.7	7
Binary Classification	Traffic Signs Recognition (test)	Accuracy91.7	6
Time-series classification	WalkingSittingStanding	Accuracy0.897	6

Showing 10 of 11 rows

Other info

Code

Follow for update

@wizwand_team Discord