Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Dynamic Computational Time for Visual Attention

About

We propose a dynamic computational time model to accelerate the average processing time for recurrent visual attention (RAM). Rather than attention with a fixed number of steps for each input image, the model learns to decide when to stop on the fly. To achieve this, we add an additional continue/stop action per time step to RAM and use reinforcement learning to learn both the optimal attention policy and stopping policy. The modification is simple but could dramatically save the average computational time while keeping the same recognition performance as RAM. Experimental results on CUB-200-2011 and Stanford Cars dataset demonstrate the dynamic computational model can work effectively for fine-grained image recognition.The source code of this paper can be obtained from https://github.com/baidu-research/DT-RAM

Zhichao Li, Yi Yang, Xiao Liu, Feng Zhou, Shilei Wen, Wei Xu• 2017

Related benchmarks

TaskDatasetResultRank
Fine-grained Image ClassificationCUB200 2011 (test)
Accuracy86
536
Fine-grained Image ClassificationStanford Cars (test)
Accuracy93.1
348
Image ClassificationStanford Cars (test)
Accuracy93.1
306
Image ClassificationCUB-200-2011 (test)
Top-1 Acc86
276
Fine-grained Image ClassificationCUB-200 2011
Accuracy86
222
Fine-grained Image ClassificationStanford Cars
Accuracy93.1
206
Fine-grained Visual CategorizationCUB-Birds
Accuracy86
26
Fine-grained visual classificationCUB-200
Accuracy86
24
Showing 8 of 8 rows

Other info

Follow for update