Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Differentiable Patch Selection for Image Recognition

About

Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained end-to-end using backpropagation. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.

Jean-Baptiste Cordonnier, Aravindh Mahendran, Alexey Dosovitskiy, Dirk Weissenborn, Jakob Uszkoreit, Thomas Unterthiner• 2021

Related benchmarks

TaskDatasetResultRank
Time-series classificationSelfRegulationSCP2
Accuracy55.1
55
Time-series classificationHeartbeat
Accuracy70.5
51
Time-series classificationSelfRegulationSCP1
Accuracy87.2
45
Multivariate Time Series ClassificationFinger Movement
Accuracy58
39
Time-series classificationFaceDetection
Accuracy65.4
34
Multivariate Time Series ClassificationMotorImagery
Accuracy53
28
Fine-grained visual classificationCUB-200
Accuracy86.7
24
Traffic Sign RecognitionSwedish traffic signs dataset Subset setup (test)
Accuracy91.7
7
Binary ClassificationTraffic Signs Recognition (test)
Accuracy91.7
6
Time-series classificationWalkingSittingStanding
Accuracy0.897
6
Showing 10 of 11 rows

Other info

Code

Follow for update