Learning When and Where to Zoom with Deep Reinforcement Learning

About

While high resolution images contain semantically more useful information than their lower resolution counterparts, processing them is computationally more expensive, and in some applications, e.g. remote sensing, they can be much more expensive to acquire. For these reasons, it is desirable to develop an automatic method to selectively use high resolution data when necessary while maintaining accuracy and reducing acquisition/run-time cost. In this direction, we propose PatchDrop a reinforcement learning approach to dynamically identify when and where to use/acquire high resolution data conditioned on the paired, cheap, low resolution images. We conduct experiments on CIFAR10, CIFAR100, ImageNet and fMoW datasets where we use significantly less high resolution data while maintaining similar accuracy to models which use full high resolution images.

Burak Uzkent, Stefano Ermon• 2020

Related benchmarks

Task	Dataset	Result
Visual Active Search	DOTA	ANT0.42	162
Image Classification	fMoW (test)	Top-1 Accuracy67.3	60
Visual Active Search	xView (test)	ANT (C=25)112	54
Visual Active Search	xView single-query setting SB (Sail Boat) as Target (test)	ANT103	39
Visual Active Search	xView single-query setting Building as Target (test)	ANT8.01	39
Visual Active Search	xView single-query setting with SC (Small Car) as Target (test)	ANT6.71	39
Visual Active Search	DOTA Ship (test)	ANT2.96	27
Visual Active Search	DOTA Roundabout (test)	ANT2.99	27
Visual Active Search	xView single-query setting CC (Construction Site) as Target (test)	ANT2.33	27
Visual Active Search	DOTA Plane (test)	ANT5.25	27

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord