Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AutoSTR: Efficient Backbone Search for Scene Text Recognition

About

Scene text recognition (STR) is very challenging due to the diversity of text instances and the complexity of scenes. The community has paid increasing attention to boost the performance by improving the pre-processing image module, like rectification and deblurring, or the sequence translator. However, another critical module, i.e., the feature sequence extractor, has not been extensively explored. In this work, inspired by the success of neural architecture search (NAS), which can identify better architectures than human-designed ones, we propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance. First, we design a domain-specific search space for STR, which contains both choices on operations and constraints on the downsampling path. Then, we propose a two-step search algorithm, which decouples operations and downsampling path, for an efficient search in the given space. Experiments demonstrate that, by searching data-dependent backbones, AutoSTR can outperform the state-of-the-art approaches on standard benchmarks with much fewer FLOPS and model parameters.

Hui Zhang, Quanming Yao, Mingkun Yang, Yongchao Xu, Xiang Bai• 2020

Related benchmarks

TaskDatasetResultRank
Scene Text RecognitionSVT (test)
Word Accuracy90.9
289
Scene Text RecognitionIC15 (test)
Word Accuracy81.8
210
Scene Text RecognitionIC13 (test)
Word Accuracy94.2
207
Scene Text RecognitionSVTP (test)
Word Accuracy81.7
153
Scene Text RecognitionIIIT5K
Accuracy94.7
149
Scene Text RecognitionSVT 647 (test)
Accuracy90.9
101
Scene Text RecognitionCUTE
Accuracy84
92
Scene Text RecognitionIC15
Accuracy81.8
86
Scene Text RecognitionSVT
Accuracy90.9
67
Scene Text RecognitionIIIT5K 3,000 samples (test)
Word Accuracy94.7
59
Showing 10 of 17 rows

Other info

Follow for update