Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition

About

Nowadays, scene text recognition has attracted more and more attention due to its various applications. Most state-of-the-art methods adopt an encoder-decoder framework with attention mechanism, which generates text autoregressively from left to right. Despite the convincing performance, the speed is limited because of the one-by-one decoding strategy. As opposed to autoregressive models, non-autoregressive models predict the results in parallel with a much shorter inference time, but the accuracy falls behind the autoregressive counterpart considerably. In this paper, we propose a Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency. Specifically, PIMNet adopts a parallel attention mechanism to predict the text faster and an iterative generation mechanism to make the predictions more accurate. In each iteration, the context information is fully explored. To improve learning of the hidden layer, we exploit the mimicking learning in the training phase, where an additional autoregressive decoder is adopted and the parallel decoder mimics the autoregressive decoder with fitting outputs of the hidden layer. With the shared backbone between the two decoders, the proposed PIMNet can be trained end-to-end without pre-training. During inference, the branch of the autoregressive decoder is removed for a faster speed. Extensive experiments on public benchmarks demonstrate the effectiveness and efficiency of PIMNet. Our code will be available at https://github.com/Pay20Y/PIMNet.

Zhi Qiao, Yu Zhou, Jin Wei, Wei Wang, Yuan Zhang, Ning Jiang, Hongbin Wang, Weiping Wang• 2021

Related benchmarks

TaskDatasetResultRank
Scene Text RecognitionIC13, IC15, IIIT, SVT, SVTP, CUTE80 Average of 6 benchmarks (test)
Average Accuracy90.5
105
Scene Text RecognitionSVT 647 (test)
Accuracy95.4
101
Scene Text RecognitionCUTE 288 samples (test)
Word Accuracy92.7
98
Scene Text RecognitionCUTE
Accuracy84.4
92
Scene Text RecognitionSVTP 645 (test)
Accuracy85.9
54
Text RecognitionIIIT, SVT, IC13, IC15, SVTP, CT
IIIT Acc95.2
37
Scene Text RecognitionIIIT 3000 (test)
Accuracy96.7
35
Scene Text RecognitionICDAR 2015
Accuracy (No Lexicon)83.5
35
Scene Text RecognitionIC15 1811 (test)
Accuracy88.7
30
Scene Text RecognitionIC13 1015 (test)
Accuracy95.4
27
Showing 10 of 25 rows

Other info

Follow for update